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‘ ' A'B'STRACT 


This investigation is concerned, with automatically making an array 
of detectors form a beam in a desired direction in space when unknown in- 
terfering noise is present so as to maximize the output signal-to -noise ratio 
(SNR) subject to a constraint on the super-gain ratio (Q-factor). Tapped 
delay line structures combined with iterative gradient techniq^ues to adjust 
the tap weights are used to do this. ’ 1 

First, we investigate the relationship between viewing the detectors 
as a "detector array” and viewing the detectors as a "multichannel filter. " 

Next, starting from the multichannel filter point of view we investi- 
gate the sensitivity of the SNR to random errors in the tap weight settings 
and random errors in our knowledge of the detector locations. Because this 
calculation is exceedingly difficult from the multichanned filter approach, we 
will use the previously derived relationship to show that this sensitivity is 
essentially given by the super-gain ratio. We show that when we use linear 
arrays of detectors separated by one-half wavelength or less, this sensitivity 
factor may become very large when we use those currents and phases (or 
tap weights ) which maximize the SNR, thus indicating that we should not try 
to design our detector pattern or multichannel filter coefficients on the basis 
of maximizing the SNR alone, but rather on the basis of maximizing the SNR 
subject to a constraint on the super-gain ratio.' 

We then develop a computationally fast numerical method of finding 
the optimum excitations which maximize the SNR subject to a super -gain 
ratio constraint when the interfering noise is known. 

Next, we try to analytically consider adaptive algorithms which maz- 
imize the SNR subject to a constraint on the super-gain ratio when unknown 
interfering noise is present, but because the SNR and super-gain ratio are 
nonlinear quantities, it turns out to be exceedingly difficult to prove conver- 
gence of the algorithms to the optimal solution, or to find the algorithms* 
rates of convergence. Thus, solely for the purpose of mathematical tract- 
ability, we consider adaptive algorithms which minimize the mean square 
error (MSE) subject to a linear constraint. 

ii 



Finally we present the results of computer simulations of algorithms 
which maximize the SNR subject to a constraint on the super -gain ratio when 
unknown interfering noise is present. 


iii 
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CHAPTER 1 ■ 


INTRODUCTION 

This investigation is concerned with the optimal design of a detector 
array and signal processor to maximize the output signal -to -noise ratio 
(SNR) subject to a constraint on the super-gain ratio (Q-factor). We will 
present and analyze an iterative gradient projection technique 'to achieve 
this optimal design even when the noise statistics are unknown to the de- 
signer a priori, f • , 

Some of the motivations for undertaking our study at the present 
time are: 

' 1. . The recent ability to approximate the sophisticated process- 
ing required through the use of fast,- special-purpose digital computers, 

2. The recent use of channels, such as are present in space- 

craft and' underwater comrhunications', where the additive noise from spa- 
tially distributed noise sources predominates over the additive- receiver 
noise., . • ■ . ' 

3. The recent use of acoustic and seismic channels where the 
low signal frequencies used result in long signal and noise wavelengths 
(relative to array size), thus to'high correlations between- the 'noise at the 
array elements;, -which .in turn implies that we might achieve improved 
performance through the use of array processing techniques. 

4. The limited ability of design procedures based upon the class- 
ical concept of an antenna pattern to adequately satisfy the criteria of min- 
imum probability of error or minimum mean squared error or maximum 
SNR. 

The first three factors are self-explanatory. The last one deserves 

some comment. Some of the advantages (and limitations) of 'the classical 

antenna pattern approach to the design of array processors are; 

* 

1. .The'^approach subdivides the system design problem into two 
separate pieces. An antenna engineer designs the array (spatial processor) 
and independently,,.^ communications engineer takes the s-ingle channel antenna 
output and designs the temporal processor to 'give, for example, the best, in 
some sense, estimate of the transmitted signal.- 
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This would seem to be an advantage, however, Gaarder^‘^^"^^^has 
shown that this factoring of the optimum processor into spatial and temporal 
processors is, in general, impossible, and consequently, processors de- 
signed on this principle are suboptimum. 

2. The concept of an antenna pattern assumes that we are deal- 
ing with monochromatic or quasi-monochromatic fields. For the wideband 

signals coming into use, there is no easy way of combining the various fre- 
^ ' * 

quency components together. 

Previous researchers ^^^"^^^^have considered the design of -detector 
arrays to maximize some criterion without constraints, both from' the "de- 
tector pattern" point of view and from the "multichannel filter" point of 
view. More recently investigators have devised adaptive algorithms 

to enable processing structure composed of tapped delay lines (such as that 
shown in Fig. 6. 2. 1) to converge to an optimal structure even when the noise 
statistics are unknown to the designer a priori. These algorithms are sim- 
ilar to those used to adaptivity equalize telephone and other dispersive com- 
munication channels. 

These previous authors have designed adaptive algorithms which 
minimized the MSE, or maximized the SNR, by using iterative gradient 
techniques to make the tap weights converge to values which optimize-the 
MSE or .SNR in the steady state. Any individual tap weight usually con- - 
verges to its. steady-state value in a manner similar to. that shown in Fig,' 

1. 1 below. 


STEADY - 
STATE 
VALUE 



Fig. 1. 1 Convergence of an arbitrary tap weight to 

its steady -state value 
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In -the steady state, each tap weight can be viewed as having a nominal 
value plus a random variation about .this. nominal value. If we use the un- 
biased algorithms of Widrow, ^^^Griffiths ^^^^.and Somin^^^^the nominal 
value is' the same as the optimal. -value of the tap weight. How.ever a question 
that immediately arises is the..f oil owing: How sensitive is the SNR to the small 
random variations' in. the. tap- weights about their nominal values? 


In chapter three'we will show that, depending upon the geometry of 
the detector array, the SNR can b'y very sensitive to these small random 
variations, and we will derive an expression for this sensitivity. 


In order to derive the expression for the sensitivity, some reformula- 
tion of what previous investigators have done, both from the ’’detector pattern” 
point of view and from the ’’mulfichanned filter” point of view, will be nec- 
essary. This will be covered in chapter two where we will also demonstrate 
that both approaches lead to the same results under a monochromatic assump- 
tion, which is to be expected, since there is only one physical problem. The 

reason for our reformulation is as^^follows; We will be able to express the 
Z-PZ ■ 

SNR in the form or — ^ where the vector Z represents 

z"qz” I"AI ' ■ ■ 


the complex gains (or tap weights'! in the multichannel filter approach and the 
vector I represents the excitation currents in the detector pattern. By the 
sensitivity of the SNR to random errors in the tap weights, we mean that if we 
replace Z by Zj^ + Z^^ where N denotes the nonainal value and R denotej the 

random fluctuations about this nominal value, the expected .value of- 

' « •X‘ 

^ Z P Z 

may turn out to be of the form E- 


-N^-N 


Z'"PZ 
Z*QZ 
+ an additional 


Z'"Q Z 




term, and we then define the ratio of the additional term to the nominal term 
as our sensitivity factor. However, using this approach, the calculation of 

(-z"p z ) ' ' 

( is exceedingly complex. . Instead, because we showed in chapter 
/ -Z. Q Z \ ‘ 

two that the detector pattern and •multichannel filter approaches were inter- 
changeable, we will use the detector pattern approach and rewrite the SNR 
expression above in terms ' of the po'wer pattern, which in turn depends upon 
the excitation currents, and then by examining a picture of a typical power 
pattern, we will be lead by physical-reasoning to approximate the sensitivity 
of the SNR to random variations in the tap weights, by the super -gain ratio. 
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which is -a measure of the sensitivity of the power at the peak of the beam 
to random errors in the detector excitations. In other words j instead of 
saying that changes in the tap, weights cause changes in the SNR, we are 
now saying that changes in the tap weights cause changes in the peak of 
the power pattern which in turn is the main reason the SNR changes. Thus 
if we constrain changes in the peak of the power pattern we will also auto- 
matically constrain changes in the SNR. The advantage is that we can easily 
derive an expression for changes in the peak of the power pattern due to 
changes in the tap weights (or detector currents), whereas we cannot easily 
derive an expression for changes in the SNR due to changes in the tap weights. 

.As mentioned before, we will show in chapter three that although, for 
a particular array geometry (specifically a linear array of detectors sepa- 
rated by half a wavelength, where the signal is impinging from endfire), we 
might initially.be lead to believe that we can achieve very good performance 
by setting (usually by means of an adaptive algorithm) the tap weights equal 
to those values which maximize the SNR, if we also look at the super -gain 
ratio, we will see that in practice we will not get this good performance be- 
cause of the extreme sensitivity of the SNR to the small deviations in the 
tap weights from their optimal values. 

After demonstrating this, section 3. 2 goes on to answer the question 
of how high a SNR can we get if we constrain the super -gain ratio to equal 
some reasonable value. In order to do this we will extend the work o^f Lo, Lee 
and Lee, ^^^^who recently developed a numerical method of solving this problem. 
Our contribution makes use of a state variable technique which enables us to 
reduce the numerical problem from one of finding the complex roots of a 
high order polynomial with complex coefficients (in all the specific numerical 
cases treated in the paper by Lo, Lee and Lee the coefficients of the poly- 
nomials were real, but this is not necessarily true in general) to one of find- 
ing eigenvalues of a real matrix which is considerably faster and easier to do. 

Next, we tried to analytically .consider adaptive algorithms which 
would maximize the SNR subject to a constraint on the super -gain ratio when 
unknown interfering noise is present. Because the SNR and super-gain ratio 
are nonlinear quantities, it turned out to be exceedingly difficult to prove con- 
vergence of the algorithms to the optimum solution, or to find the algorithms' 
rates of convergence. Thus, solely for the purpose of mathematical tracta- 
bility (the actual nonlinear problem will be simulated on a computer in 
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chapter six to -obtain some numerical indication of convergence and conver- 
gence rates)', chapter four analyzes an adaptive projection algorithm which 
minimizes the mean square error (MSE) subject to a linear constraint. We 
prove that an algorithm of the form 



W. - k (MSE) 

-J W 


converges to the Lagrange solution in real-time, with an easily expressible 
bound on the convergence rate. Here k is the step size, P is a matrix pro- 
jection operator and is the gradient of the MSE with respect 

-J 

to W., We also proved convergence and found bounds on the rate of conver- 
gence when (MSE) was (1) known exactly (2) estimated, and (3) estimated 

-j , . . 

by a noisy estimate. Physically these cases correspond to (i)knowing the in- 
terfering noise field exactly (2) using the instantaneous values of the noise 
that are present at the outputs of the detectors (or at the outputs of each of 
the delay elements comprising our tapped delay lines) as estimates of the 
noise correlation matrix, e. g. replacing E {n.(t)n.(t)} by n.(t, )n.(t, ) at 
iteration k, and (3) accounting for self-noise in the detectors and tapped 
delay lines by replacing E {n. (t)n.(t)} by n.(t, )n.(h ) + ^ at iteration k where 

1 J 1 K J K ,K 

I is additive white gauss ian noise. 

Chapter five is an investigation of an adaptive penalty algorithm 
to minimize the MSE subject to a linear constraint. Specifically we prove 
that algorithms of the form 

/ r 1 

2 


W.^, = W. 
-3+1 “J 


k V, 


W . 
- J 


MSE + K. 


T 

W/n, - a 
-J -1 


T 

r • 


where W ’ n^^ — a is the equation defining the linear constraint, coverge to 
the Lagrange solution of chapter four if is infinite. For finite, a bias 
is found to exist, and is investigated, along, with bounds on the rates of con- 
vergence of these algorithms to their steady-state values. Again we consid- 
ered the same three ways of evaluating (MSE). 

-j 

In chapter six, we set up and present the results of a computer sim- 
ulation of the gradient projection algorithm which adaptively maximizes the 
SNJR subject to a constraint on the super -gain ratio. We then conclude that 
when designing adaptive array processors one should either 


1. Calculate the super -gain ratio for the geometry under consid- 
eration for all possible incident signal directions and if we are sure that the 
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super-gain ratio can never become intolerably high feel free to use the 
adaptive gradient algorithms proposed by previous authors, or 

2. Use the constrained adaptive algorithms developed in this 
investigation, which will assure us that we get .the highest SNR possible 
subject to a constraint on the super-gain ratio should the value of the super - 
gain ratio exceed some preset value we have chosen. 
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CPIAPTER''2 

Equivalence Between "Detector Pattern" and "Multichannel Filter" 
Viewpoints in Designing Optimum Arrays 

In this chapter, we will consider the following problem: Given an 
array of point detectors at known locations in space, how should we "design" 
the array so as to maximize the output SNR ? This problem has been solved 
before-as a matter of fact, it has been solved twice before, once by antenna 
engineers, who solved for those detector current excitations which maximiz- 
ed the SNR through the use of the "detector pattern" concept, and again by 
communication engineers who viewed the array as a multichannel filter and 
solved for those filter, coefficients which maximized the SNR, through the use 
of statistical quantities such as, the covariances of the signal and noise fields. 

As explained in more detail in the first chapter, we will reformulate 
what these previous investigations have done, and show that the two approaches 
are equivalent (i.e. lead to the same optimum value of the SNR under a mono- 
chromatic noise assumption) in order that we may, in chapter three, easily 
switch from the multichannel filter point of view to the detector pattern view- 
point when evaluating the sensitivity of the SNR to small random variations 
in the tap weighs-. 

In section 2. 1 we derive the optimum currents and the resulting 
value of the SNR when these currents are used to excite the detector array. 

All our results will be a function of the assumed incident noise power. In 
section 2, 2 we derive the optimum filter coefficients and the resulting 
value of the SNR when these filter coefficients are used in the multichannel 
filter. These results will be a function of the assumed noise space -time 
correlation function. In section 2. 3 we will express the space-time correla- 
tion function used in section 2. 2 as a direct function of the incident noise 
power used in section 2. 1 and then show that under the monochromatic noise as- 
sumption, the detector patterna-pproachandthe multichannel filter approach, 
yield exactly the same value of the SNR, and moreover, we will be able to 
see that the currents of section 2, 1 correspond to the filter coefficients of 
section 2. 2. This analogy will be used in the following chapter to construct 
a quantity which is defined in terms of communication theory quantities {e.g, 
convariance), and corresponds to the super-gain ratio of antenna theory. 
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Section 2. 1 "Detector Pattern" Approach 

(19) 

The material in this section follows the approach of Lo, Lee and Lee. 

Assume we have N isotropic detectors located at arbitrary positions 

in space, specified by Cartesian coordinates r = (x y z ) as shown in 
f r ! —nn, n, n 

Fig. 2. 1. 1. 



Fig. 2. 1. 1 Detector Array 


til 

The current in the n — detector will be denoted by I . Let us 

• • • ■ . ■ ^ Vi. 


define 






where the asterisk denotes' adjoint. The detector pattern is given by 

j k r • r 

. p(9.0)='Z V 

‘ n= 1 

where the r ''S are given by 

r = sin 0 cos rf> x + sin 0 sin (by + cos 0 z 
-o ^ -o — o • — o 


( 2 . 1 . 1 ) 


( 2 . 1 . 2 ) 


Since k - 


r=xx + yy +ZZ = the position of the n — element 
-n n - o ^ ~o n — o ^ 

2 TT 


we have 


*= £o • in = 


A. Y ^ 

- sin B cos <p ^ sin0 sin cos 0 
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We will define 


ili a k r • r ” 2 it 
— o — n 


sin 0 cos 0 + — ^ sin 0 sin(^-f— cos 0 


Equation (2. 1. 2} becomes 

^ j ;// ^ 

2 e ^ V 

i= 1 


P (0 _. 0) = 
where V is given by 


(2. 1.3) 


(2, 1.4) 


+ 3+1 


V = 




+ 


n 


(2. 1. 5) 


If we assume the normalized signal is incident from direction. ( 0^s 
then the received signal power is given by 


S = 


1p(0,^) 1^16(0 -0^. (j>~ (p^) d 


Q 




2- 


= 1 Zi 1 


( 2 . 1 . 6 ) 


where 


• , o . o 

-J i//. 


❖ r 1 

= e ^ 

1 . 


n 


and ip = 27 t 
n 


X , y^^ _ , z.n 

-r — sin 0 cos (h + — sin 0 sin (b + — r — cos 0 
\ o \ o K 


(2. 1.7) 

(2. r. 8) 


Define the matrix C by 


V V 
-1 -1 


3^- 


■34'. 


n 






= C 


(2. 1. 9) 


J 

Note that C is a Hermitian positive definite matrix (dyadic) 

2 


Proof: x’^Cy: =• y"Vj^V^'"x = > o if x 


Thus 


S = I C I 


( 2 . 1 . 10 ) 
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Let us assume that the spatial distribution of the noise power is given by 
T ( 0, (p). Then the noise power received is; 

N= // Ip(Q, T{6,(^)dJ2 (2,1.11) 

e (p 


^ f I LX X\ (p) 

0 (p 


Since the currents I are. not functions of 0 or (p 

n 


N = I 


S J 1 (p) dQ 

S(p. 


Define the matrix A by 


N ='I A I 


( 2 . 1 . 12 } 


where the elements of the matrix A are given by a 

J 


’■ki 


= f fe e ^ 1{B,(p) d Q. 

Q(p 


The matrix A is positive definite 


Proof: ^ ^ ' ~ ^ J f X X df2 X 

~ ~ Q(p 


= //[x"y][y\] T{0,(^))d 


S 2 


Because T (0, is always positive, we may write it as . 

•it* 

T(9, 0) = g ( 0 , (|)) g" ( 0 ^ t|> ) where g and g" are scalars 


Thus 


X A x‘ = 


0 ^ 


] l.'?’!! 1 


d SI 



- 11 - 


= If lgx\1^d« 

0 (f) 

Since the integrand is positive 

x'^A _x > o if X j|^ o 

QED 


The signal-to -noise ratio (SNR) is then 


SNR 



C I 


❖ 

I A I 


(2. 1. 13) 


We rnay use the calculus of variations to find the value of ^ which maximizes 
the SNR. From Appendix A 


I .. = A"-^V . 

— optimum — 1 


(2. 1. 14) 


The value of the SNR when I = I .is 

- - opt 


SNR = 


❖ 

I ^ C I ^ 
— opt —opt 


❖ - 1 

= A Vj 


I . A I . 
— opt —opt 


The best SNR that we can achieve by using the "detector pattern" 
approach to the problem of optimizing the SNR is thus 

SNR = (2,1,15) 

We will now find an expression for the best SNR we can achieve 
by using the multichannel filter approach to the problem of optimizing 
the SNR and then show under what conditions the two approaches yield 
the same value for the best SNR , 
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Section 2. 2 Multichannel Filter Approach 

Ass timing that we know the noise space -time corelation function, 
let us now find the optimum multichannel filter, optimum in the sense that 

I - 

we will find the s {see Fig. 2. 2. 1) which maximize the SNR. Once the 
coefficients of the optimum filter have been found, we will be able to write 
an expression for the best SNR we can achieve through the use of the multi- 
channel filter approach. 

The material in .this section follows the approach of Edelblute, Fisk 
and Kinnison^^^, 



Fig, 2. 2, 1 Multichannel filter structure 


The SNR at the multichannel filter output when i ^(t) = _s^(t) + ^|^{t) 
is received is given (under the assumption that the signal and noise are 
complex uncorrelated random waveforms) by 
N N 


Z Z 


„ i=lj=l i j^ij 
SNR = ^ 


Z Z 


Or 

' z"p Z 
Z*Q Z 


( 2 , 2 . 1 ) 


where E { s^ (t) ^ (t)s^(t) } = u (2. 2. 2) 

p. . = E{s.*(t) s. (t) } (2. 2,3) 

q. . = E { n. (t) n. (t) } 


(2, 2.4) 
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Z 



(2. 2, 5) 


Note that P and Q are correlation matrices and thus are Hermitian 
positive semidefinite (we will assume that Q is positive definitCs which is 
generally true in practice the Q matrix is usually of the form Q = al + Q 

where the al term is due to additive self-noise at each detector, thus 
guaranteeing the existence of Q ) 

Note the similarity between equation (2. 2. 1) and equation (2. 1. 13). 
Also note that the SNR is independent of the magnitude of Z. JLet us now 
find the value of Z that maximizes the SNR by using the calculus of varia- 
tions, i, e. 

z"pz 

maximize L = — x 

Z Q Z 


This equation is of the same form, as equation (Al) of .Appendix A. 

By the same reasoning as in section 2, 1 (see equation 2. 1. 15) we have 


P Z f Z Q Z 
— o 1 — o — o 


QZ/Z PZ\=o 
— o I — o — o 


where Z^ = optimum Z 


P Z 
— o 


(5o 


P z 


) 


L(V“2o)J 


scalar 


Q Z 
— o 


Let G = 
o 


f Z "P Z ) 

(Z "Q Z ) 
\ —o — o/ 


(2. -2. 7) 


( 2 . 2 . 8 ) 


P Z = G Q Z 
— o o — o 


Thus 


( 2 . 2 . 9 ) 
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Equation (2, 2.9) is an equation which must satisfy, it is not however, 
an explicit expression for Z^. Motivated by this need, and seeing from 
section 2. 1 that one way to find such an explicit expression for Z is by 
letting the' P matrix be written as P s (i. e. let P be of rank 1) let 

us do the following: 

Assume the signal field;is produced by a single source -located at 
(9^, c|) ) in the far field, which is generating a statistically known random 
output. 



Fig. 2. 2. 2 Incident Signal Field 


The signal may be represented in the form (where we have suppressed 
the e"^'^ ^ ^ time dependence) 


•k • X ' y 

s (x , t) “ s (t) e~^ ~ where k s u — u 


— — o c — o \ 


At the various hydrophone locations, the received signal is 


s (r . t) = s (t) e 

— 1 , ~ 


• w 

-1 — u ‘ r . 
J c — o —1 


= s (t) e' 


3 ^ 


u • r . 
— o —1 


let T. ~ 
1 


u ^ r . 
— o —1 


( 2 . 2 , 10 ) 
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- Thus 


-J w T. 

(r . t) = s (t) e 


— 1 . 


( 2 , 2 . 11 ) 


The average signal power present in any hydrophone due to this, 
signal is 


S = E 


s"(r t) s ( r . t) 

— 1 , “"J-J 


= E { s (t) e 




s (t) e 


-J W T- 


s E 


5 !< 


S (t) s'(t)‘ 




The normalized signal correlaiion matrix elements are 


❖ 




E s (r . t) s (r t) 



R (o) 
s 


E 


a. j<J T. T. 

s"(t) e s (t) e 


R,(o) 


j W ( T . - T . ) ( j 

e ^ • J E ■ 5' (t) s (t) 


= e 


j CO ( T-- Tj ) 


(2.l2. 12) 


Define 




+j W 


+ j CO T 


n 


J 


^1 


if: 


-JCOT^- -JCOT^ 

0 » • • 0 * 


L . 


Thus 


P = 2 i 2 i 




( 2 . 2 , 13 ) 


( 2 . 2 , 14 ) 
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We can repeat the steps leading to equation (A3) of Appendix A, 


to get 




Q-'Uj 


(2.2, 15) 


Since the SISTR is independent of the magnitude of Z^j we see that 


Z- =» Q"^U, 

— o —1 


(2.2. 16) 


is the solution for the optimum Z, 

Using this value of Z , the optimum value of the SNR is 

SNR = Q"^U^ (2.2.17) 

This expression represents the best SNR that we can achieve by- 
using the multichannel filter approach to the problem of optimizing the 
SNR. 

Tn the next section we will investigate under what conditions this 
expression and the expression derived in section 2. 1 for the best SNR we 
can achieve by using the detector pattern approach yield the same values 
for the optimum SNR. 
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Section 2, 3 Relationships between the "detector pattern" and multichannel 
filter approaches 


In section 2. I we found an expression for the best SNR we can achieve 
by using the "detector pattern" approach. In section 2. 2 we found an express- 
ion for the best SNR we can achieve by using the multichannel filter approach. 
We will now show that these two expressions for the optimum S]MR are 
equivalent if the noise is monochromatic. The monochromatic assumption 
must be added to the multichannel filter approach because it is already inher- 
ently contained in the detector pattern approach, i. e. in deriving equation 
(2. 1. 2) the detector excitations were assumed to be monochromatic. 

Showing that the two SNR expressions are equivalent entails express- 
ing the space-time correlation functions d> ^t,x, x ^ 

° ^ f ; used m section 2. 2 ^ 1 . e 

used in the sense that = e| nj^*{t) | = )) direct 

functions of the incident noise power T (0 , used in section 2. 1 In order to 
do this we will first find the space -time correlation functions between the 
point detectors in the array as functions of the incident noise field. Next we 
will find the incident noise power as a function of the incident lioise field. 
Finally we will be able to express the space-time correlation functions as 
direct functions of the incident noise power. 

We will then apply the general theory to certain special noise power 
distributions and a particular array configuration. We will show, that -under 
a monochromatic noise assumption, for these noise- power distributions and 
this array configuration, the detector pattern approach and the multichannel 
filter approach yield exactly the same SNR results. Although we have used 
particular noise power distributions and a particular array configuration, 
this was only done to simplify the evaluation of certain integrals, and the 
equivalence does not depend upon the incident noise field, or the array 
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For simplicity, let us initially assume that the total incident noise 
field consists of one plane wave emanating from one source located on the 
surface of a sphere of infinite radius as shown in Fig 2,3. 1. We will de- 
note this plane wave. by p (8^, 4>^, x, t) where 0^ and are spherical coor- 
dinates specifying the direction of propagation, which is ais.o denoted by.u^. 
In complex notation 


P (8^, X, t) = £ e~ J — * — e"*" * U. 3, 1) 


where P = P (9 j a- complex scalar random variable (for electromagnetic 

fields p would have to be a complex vector random variables but we are con- 
sidering acoustic fields) and 

k wavenumber = u (0. , b ) 

— c — o o o 

x = xx +yy A. z z 

u = — sin 0 ■ cos 4> X — sin 0 sin cb y — cos 0 z 
— o o m — o o ~o — o o — o 
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An alternate way of writing p ,x,x fr) . is 




p e' 


j CO {f - K • 'jx ) 


(2.3.2) 


where ••K-=- — : r • '.ti ; ) is .independent .of frequency, 

CO ■ . c -o o ,o , . - - ■ ' (z.S. 3} 

' * t *" . ■ ' ' ^ ' * ' ' / 

Since the actual noise sources we wish to investigate do not emit monochro- 
matic waveforms but rather- superpositions of monochromatic waveforms, 
let us change the assumption of one plane wave emanating from one source 
to an arbitrary superposition of plane waves emanating from one source. 

In this case 


k = k (0^, cj>^, C)i ) 


CO 


u (0 > cf> ) 
c — o o ^o 


K « — still independent of frequency 


p (0 , cjj , X, t) = r p (0 , <j) , CO } e-^ “ ^ d (o 

•^^oo 

CO 

Noting that (t — K » x ) is independent of frequency, we may define 


(2. 3.4) 


P { -Xj t)'.= , q^ ( 0^, cj)^, t - K . X ) 


(2.3, 5) 


where <I(0,cj),t-K’X) for fixed 0 and cjj is a sarnple function of 
a stationary, zero-mean random process, with space-time covariance 
function 

'^q^®o*V V^2’ ^ 1 -^ 2 ^- '^r^ ''-l^ai9o’V‘^2-‘^’-2^| 

(2,3.6) 
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Let us now drop the assumption of there being only one source lo- 
cated at coordinates f 0 , A ) and instead assume that the noise field is 

o o 

generated by one point source on the infinite sphere corresponding to every 
different value of (0, ^). Thus the total noise field is given by ^ 

n (x, t) “ // q (0, t-K • x) _d 52 (2,3,7) 

■ * ' • 0 c() 


We will assume that the sources are statistically independent of 
one another, implying that q ^ ~ independent of 


E- 


: j-q - ) q t|5 - K • Xp)|= o 

for (Qji^ » / ('02’ ^ 2 ) 

' “ ‘ ♦ * ' 4 , 

We may combine equations (2. 3, 6) and (2. 3. 8) to give 


(2,-.3.8) 


c (ej,4.^.e2,+2. - tj, Xj -Xj) = C £i - 


where 


TT 2tt 


ff 6ie^-d^ , sin e^-d d' = 1 


(2. 3. 9) 


(2. 3. 10) 


o o 


Thus, the total noise field is stationary, with zero mean and space -time 
covariance 

^n^^l ■ ^2’ ~1 " -2^ = ~ -2’ ^2^[ ■ 


- /7.C (0.<i>.t ■ 

0 cjj ^ 



x^) d 52 


“ '(2. 3. 11) 


Note that, if the number of statistically independent noise sources is large, 
the re suiting, total noise field is gaussian, and the mean and covariance 
completely describe the noise field. 
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Two simple special ca'ses of the above, general noise field (evaluated 
for the special case ^ j ~ ^'2 — will later show that this .is the only case 
we must consider explicitly, all other cases follow from this one by equation 
2„ 3o 16) are: 

■Monochromatic Noise 

+ i 2 Trf T ' 

C‘ (0, 4*» ■“ ^7 - o) ~ ^ o ) e (2.3. 12a) 

q — i c q 

White Noise 

C ( 9, (}), T , X ^ - X _ = o ) = C (6, <j), o ) 6 ( T ) s (2. 3o 12b) 

q — 1 •- q - 

Let us now find the correlation between any two detector locations x ^ and 
x^ in the x - y plane. ' ' 

The noise incident upon a receiver located at x ^ is 

n(x i,t) = f fq(0, <|), t-K*x )dn 2.3. 13) 

oV - - 

We will now let ^ be the origin. of our coordinate system, since 
only the magnitude and direction of the difference ^ 2 "— i importance 

(this is because the noise sources are in the far field). 
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We’ assume there is no. attenuation, as each plane 'wave comprising 
the noise field travels between, the detectors at positions and x^. All 
plane -waves, no matter what their frequency, move at -fhe same velocity^^ 
because the medium is assumed to be homogeneous and isotropic. 


Let a be the angle between (6, <|)) and(x ^ i.6. a — a. (6, 4>) 

is measured in the plane formed by the line x^ “ j direction of 

the incident plane wave As we have the coordinates set up, with the 

noise field incident from the first octant'and x^’ m the first quadrant, the 
noise wave hits before x ^ in time. • Thus if the noise hits x ^ at time 

t, it hits X 2 ^ “ *^12 ^ where 


Tjz^os a = - 




C 


( 2 . 


On the other hand, if_the noise is at x ^ at time t, it is at ^ at 
time '^12 a ’ . 


Thus 


-1> 

t) = 

n 


2’ 

t - Tjjcos a) 
\ 

-2’ 

t) = 

n 


r 

t -f T ) 


(2 

(2 


The space-time correlation function of the noise process is 


x^ - x^) = E 


= E 


n"'(x t) n (x^ t - t) 


ri" (x 1 , t ) n (x ^ , t - T + Ti 2 cos u) 


= C (t - T,., cos a, X - X ), 
n iz — i — i 


= f f ( 0, b, T - a, O ) dv^ 


0 CD 


3. 14) 


3. 15a) 
3. 15b) 


(2. 3. 16) 
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Under the monocl>romatic noise assumption of equation (2.3. 12a) 


~ 2.) ‘ 
II 1 £j ri 1 Si 


'■j 2 IT £^[t - Tj 2 cos a] 


d 


0 ^ 


(2. 


Under the white noise assumption of equation (2.3, 12b) 


‘i’r, t ^ 1 ■ ^7 ^ “ I f^a ( 'i'« 2.) S [ T - T,, cos a] 

H 1 W A 1 T. 


d Q 


( 2 . 


8 4 > 


Equations (2.3. 17) will be used in the multichannel filter point of 

• <ou 

view when we have to evaluate q^^ = E | m ^(t) ^jCt)[ “ 

The total noise power incident at the origin (or at any detector) is 
given by <|>^(0'> ° ). This follows by analogy with the power contained in a 
one demensional random process whose autocorrelation function is 
i. e. total power = J* S (co) dcu = R (o). 


CO 


Noting that x ^ - x ^ £ implies T^^^ a = o, we have, under 

both the monachromatic noise assumption and the white noise assumption 


(j) (o, o ) = / /C (0,<j>,o) d 

n Q j. H 


n 


( 2 . 


0 


Thus the spatial distribution of the noise power under either the 
monochromatic or white noise assumptions is 


T(6,cj)) = G (e, 4>, o ) 


In general, the equations we must use to transform between the 
detector pattern and multichannel filter viewpoints are, from equations 
(2,3. 17) and (2. 3. 19): 

Under the monochromatic noise assumption 




j 2 T 7 f^ [ T - T^ ^ COS a] 


d Q 


( 2 , 


3. 17a) 


3, 17b) 


3. 18) 


3, 19) 


(2,3.20a) 
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Under the white noise assumption 

^ (t, X - X ) =* // T (9, <})) 6 [ T - T cos a] dQ (2. 3, 20b) 

n -k -i 6^ ' . . 

Equations (2.3. 20) are the results we have been striving for in 

this section. They express ‘the space -time correlation functions used ■ ■, 

in section 2. 2 as direct functions of the incident noise power T(9, used 
in ‘section 2. 1. 

We will now use these equations to show that under a monochromatic 
noise assumption (i. e. we will use equation (2. 3. .20a)), the detector pattern 
appro'ach'and the' multichannel filter approach yield 'exactly the same values 
for the optimum SNR. Wc cannot show this is true for' all possible spatial 
noise power distributions and all possible .array configuration, because there 
is no general way of evaluating the integral in equation (2. 3, ,20a). Because ^ 
of this w'e will apply>the theory developed above to three particular ;spatial 
noise power distributions and one particular array geometry. We.will show, 
that under a monochromatic noise assumption, for these noise power distri- 
butions and this array configiaration, the- detector pattern approach and the 
multichannel filter approach yield" exactly the same SNR results,- Although . 
we have used a particular array configuration and particular spatial noise 
power distributions, this was only done to simplify the evaluation of the 
integrals, and the equivalence can be seen to be independent of the incident 
spatial noise field and the array geometry. 

The three'spatial noise power distributions we' will consider- 'are: 

1. T(0,<j>)= T(9,4)) 6 (0-0p» 

T isotropic noise 
T for (6, in the first octant 

« i* f " 

0 otherwise 

We will assume that the point detectors are equally spaced along 
the z axis, separated by a distance d. , 


2. T(0, cf>) = 

3: T(0, <()) = 
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Fig. 2. 3. 3 Detector Array- 


In Appendix B we evaluate (j> (t, x,-x ) for the three spatial 

noise power distributions assunaing the noise is temporally monochromatic 
and white. In Appendix C we evaluate the elements of the Amatrix of sec- 
tion 2.1 In Appendix D we evaluate the elements of the Q matrix of sec- 
tion 2. 2„ . 

Using the results of the appendices, let us compare the results of 
sections 2. 1 and 2.2. From section 2. 1 we have as our expression for the.- 
optimum SNR achievable by using the detector pattern approach . 


SNR = Xi 


■ r 


where 


-j 2 tr {-^)cos 0 


• • ♦ © 


-j 2 TT (-^)(N-l)cos 0^ 


Note that we set 0.° = 2 it {-^) (i- 1) cos 0 because of our assumed ar.ray 

i A. o 

geometry. 

Summarizing section 2. 2, we have as our expression for the optimum 
SNR achievable by using the multichannel filter ’approach 

SNR = U* Q'^ Uj 


where U 


1 


1 1 - 


— - cos 0 ] 
c o*' 


+ju> [ (N-1) cos 0 ] 

^ c o 
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Note that we set t. = 

1 c 


• u • r . - d(i- 1) cos 0^ 

-o -1 ^ ' o 


because of our assumed 


array geometry. 


« 2 ir f 

Since — = -™— 
c c 


- ■ “ — ^ >• Vi 3.nd U, are equal, 
f X X i — 1 — 1 


By comparing appendices C and D, we see thatj for all three spatial noise 
fields considered, the A. matrix of -section 2. 1 and the Q matrix of section 
2. 2 are equal, thus demonstrating -that for monochromatic noise, we can 
optimize the SNR by using either the detector pattern of multichannel filter 
approach. ' Also note that from equations (2, 1. 14) and (2. 2, l6) the optimal 
current excitations and the optimal filter weights are equal, implying that 
the current excitations in the detector pattern approach correspond to the 
filter weights in the multichannel filter approacfi'i 

In conclusion, we have shown in this chapter, that under the mono-- 
chromatic noise assumption, the detector pattern approach and the multi- ■ 
channel filter approach*, are equivalent. Moreover,' we saw'that the current 
excitations of the detector pattern approach correspond to the filter weights • 
of the rhultichannel filter approach. , Again let us point out that although we 
have, used a: paricular array-configuration and particular spatial noise power 
distributions, this was only done to .simplify the_ evaluation of certain integrals, 
and the equivalence can be seen to be independent of the array geometry and 
the incident spatial noise field. •• ■ . 

In the next chapter, we will investigate the sensitivity of the SNR 
to small random changes in the detector locations and tap weights. We will 
have to use the equivalence developed in this chapter to derive an expression 
for this sensitivity. We will then show that when designing linear arrays 
where the .spacing-between detectors-is less than pne -half a wavelength, 
should use tap weight values which maximize the SNR subject to a constraint 
on the above mentioned sensitivity, in order to keep this sensitivity within 
reasonable bounds. 
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Appendix A 


Maximize 


Maximization of the SNR 
❖ 

2. c 2 

L s with respect to I . 

I A I “ 


Using the calculus of variations we get 


6 L = 


A J Cl ) + (rc 5_I )] - (I* Cl)[ ( 6 _I"A^) + U*A 61)] 


= 0 


{ I " A I ) 


implying 


I CJ {I*AJ[) --A_I (I*C^) j+ j {j/^Al)l"C - U*C2) r*A j62= 0 


Since A and C are Hermitian 

if A 61) = i^fAl)- 

{fC-61 )= (6/ci)^ 


we have 


i s V J 

(i 1 6^ 


6^'" ,j giQ"a^ ) 


Let 

G = 

thus 

6 i"g + 1 


[6I*g] * 


Since both of these terms are complex scalars and the second is 
the complex conjugate of the first, the real part of the complex scalar 
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must be zero, i. e. 

■Re I 62"G j - 0 

The only way this can be true for arbitrary 5_I is if G 
Thus 

C I ^ 1) 1) = 9. 


= 0 


By definition 




. (V I ) U> J[ ) .- A I V j ). (V ) = 0 


= Xi 


u A^) 
(I h'> 


1 ^ qA~h^ 


Where the complex scalar q is given by 

U*A^) 

qs X 


But the SNR is independent of the magnitude of J[ , so when finding 
the value of I which maximizes the SNR, we can drop the scalar q. 

The direction of the optimum vector _I , which maximizes the SNR, 
is given by 


— optimum 
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Appendix B Evaluation of cj) (t, x ’ - x- ■) for Temporally- Monochromatic, 
and White Noise. 


Note that, for the array geometry of Fig 2,4, 1, equation {2.3, 14) 

becomes T, . cos ass because u (0, tj)) k - sin '0- cos 

ki c — o — o 

— sin 0 sin cb y — cos 0 z and (x. — x) = [1 -k) d z 
— o . — o — i — k — o 


If the noise is temporally monochromatic, for the three spatial 
noise power distributions under consideration, we have from equation 
(2. 3. 20a) 


easel. ~ ® 


j 2irf 




cos 0. 


IT 2tr 


case 2. ^k " -f ^ 


j2uf^.[r. 


d(i-k) 


cos 9 


sin 0 d 0 d (j) 


o o 


* ' ' * fi f Ic "V * * C * 4 

letting y - 2irf cos 0 and re placing -j — by X gives 

° ^ o 




2T e 


j 2vf_ T 


(-^)(i-k) . 


sin 


2ir (-^) (i-k) 


v / 2 it/ 2 ^ T - d cos j 

case 3. (p (r , x - x ) ~ .f . f T e ‘ ^ -'sin0d0dcp 

^ o o 


proceeding as in case 2, we get 


j2iTf T -jir (i-k) sin tt(-^ (i-k)] 

, / \ t IT O ^ ^ A. J 

ik-i,) = -2- ® = 

ir (f-) (i-k) 


If the noise is cemporcj.Uy white, for the three spatial noise powej 
distributions under consideration, we have from equation {2,3.20b) 



case 


1. ^ 


' n 




T — 


COS 0pU -k) d 1 

' c . 


,Tr 2'TT r 


case 2. ( T ’ 2£k “ ^ ^ 


o o 


T COS0 

c . 


sin 0 d 9 d <j> 


(I d ' 

letting y =- — ~ — cos 0 gives 




- 2 tt Tc 
(i-k)d' 


if T <: 


{i-k)d 


otherwis.e 


case 3. k~— ~ 


tr/ 2 tr/ 2 j- 

T./ / 6 


o o 


T - 


(i-k)d 


cos 0 


s in 9 d 0 d 4> 


proceeding as in case 2, we get 


'(>„{’• ••5k.- 2^1 ' 


-Tc.it ^ {''je-k)d 

2 U- - E TH- ^io<T<—^ 


O 


otherwise 
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Appendix C Evaluation of the A matrix 


From equation (2. 1, 12) 


= If 

9‘ c|) 




k 


T dn 


where 


i(j 5 = 2 ir sin 9 cos cj) + ■■ sin 9 sin <j) 4- — cos 9 and 

A K A. 


(x , y , z ) is the position of the n detector. 
' n -^n n ^ 


For our array geometry the i detector is located on the z axis, 
at a distance z. = d (i-1) from the origin, the above general expression 

-I 


becomes il/ = 2tt {-^} (n.-l) cos 9, thus 

n A. 


j 2 TT {-y") (k-i) cos 


T { 9, (j)) d Q 


For the three spatial noise power distributions under consideration. 


j 2 -nr (k-i) cos 9 

case 1. = e T 4*p) 


case 2. 


Z 5’^ 2ir C-i) (k-i) cos e 
T / / e 


sin 9 d 9 d 4> 


o o 


- 4 IT T 


lin 1^2 7T C-|-) (k-i ) j 


2ir (^) (k-i) 
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case 3. 


ki 


tt/2 tt/ 2 j2Tr. {—) (k-i) cos e 
Tj / e ^ • sin0d0dcj5 

o o 



j TT (— ) (k-i ) 
\ 


sin j^TT (-^} (k-^ )] 
TT (-^) (k-J?) 
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Appendix D Evaluation of the Q matrix 


From equations (2.2,4) and (2.3, 16) 




In particular, when the noise is temporally monochromatic, for 

the three spatial noise power distributions under consideration, we have 

^b 1 \ 

from Appendix B ^remember J 


case 1. 




j 2 IT (i-k)cos 0p 


case 2', q, 


kf 


4tt T sin [2 IT (^) (i-k)] 


2it (-^)(i-k) 


case 3. q. 


^ sin[-n-(-^) U-k)] 


kf 


TT Cj~) (i-k) 
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CHAPTER 3 

Error Analysis of Point Detector Arrays 

If we were to design a point detector array or a multichannel 
filter to -extract- a signal, incident from, direction { , from back- 

ground noise, using the criterion of maximizing the SNR, as developed 
in chapter Iwo, the following .types of errors might affect the performance 
of our s-jrstern; . - , . . 

■1. Small r'andom errors in the antenna excitations or. filter coeffi- 
cients -(po'ssibly due in part to -round-off errors if we use a digital system 
to determine the filter coefficients). 

• Z'.’ Imperfect knowledge of the noise field. . • ; 

That error type two is of importance is self-evident. ■ However, the 
reader may ask if error type one is very important. It turns out that error 
type'orie can be of major importance'as can be seen by considering the follow- 
ing problem: 

Assume we wish to receive a signal propagating in the z direction, 

having wavenurnber k^ — losing a linear array of N isotropic point 

detectors located along the\ axis. Because of the sampling theorem, our 

* * * ' t « 

first inclination would be to space the N detectors one-half wavelength apart 

\ . . . . V 

^_^)j and then proceed to optimize the excitations so as to maximize the^ 

SNR. The question is, how much does error type' one affect us if we use 
this spacing? It will be shown that for spacings between detectors of less 
than about one-half wavelength, the super -gain ratio, -which is a measiire of 
how much type one errors affect the detector pattern and thus the SNR, begins 
■to get very large. This means that very small errors in the antenna excita- 
tions cause large variations in' the received SNR. A‘ bdtter approach to use 
when the detectors are separated by less than a -wa-velengtli, -would be to 
maximize the SNR subject to a constraint on the. super-gain, < or type one, 
error. This is one of the things we will investigate in this chapter. 

Because the above mentioned types of errors are present in our 
system, the following’ questions' arise: 
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1. If wti optimize the processor as in chapter two, what arc the 
effects of error type one on the SNU? 

2. What iis the optimum SNR we can achieve if wo optimize the 
processor subject- to a constraint on e,rror typo one? 

• ‘ 3. Gan we develop an adaptive algorithm which maximizes the' 

SNR subject tO'’a‘ constraint on error type one? 

The reason ,for undertaking this entire investigation is to answer 
tiuestion three - because llui development of this typo of algorithm will 
enal>le us to'ciesign -ar ray proc('Ssors which will no longer significantly 
sMiffer from the deleterit»us effects of error types one and two Umt.presciut 
day arrays suffer from. • ' ■' ‘ • ■ ’ 

In this chapt(!r we will answer questions one and two. We will an- 
. KWtir (luestion, three in. chapters four, five and six. 

Stjction i. ] Sensitivity of the SNR to Random Errors in the Detector 
Excitations and Locations. 


Consider an array of N isotropic detectors placed at some prescrib- 
tni positions in spact; whose Cartesian coordinates are given by , i*= 1, . . . i N, 

I , 

l.et (0 , d) ) Ik; the angular coordinates of the main bearp, and L.be the 
current excitation in the i^" detector. From equation (2. 1, 13} the SNR is 



where ail quantities have been defined previously in section '2, 1. 

By the, sensitivity of the SNR to random errors in the detector ex- 
citations and locations we tuean the fo 1 lowing: _ if we let the detector curn;nts 
and positions Ijo composed of a nominal term plus a random term, i. e, 

I — T + I and x . — - x. f x . the SNR is now-defined as the expected 

- — n T=r —1'' —in -sir, 

value of equation (i. 1. 1,), This expectation might turn out to be of fh« form 
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then, define the ratio of the additional term to the nominal term as our sensi- 
tivity factor; The calculation of this e^tpected value, as it stands, is exceed- 
ingly complex. However, the SNR in equation {3. K 1) may also be' expressed 
as ' ' ■ ' ' - 


SNR 


^ (u 
— o 


) 


f $ (u) T (u) dQ 


. (3. 1. 2) 


I ^ I I ^ 

I V[, $(u^}=s [I V^l 

xs the value of the power pattern at (0^ , }, and T (u) is the incident noise 

power. Again, if we let the detector currents and locations be random, the 
calculation of the expected value of equation (3. 1. 2) is exceedingly complex. 


However, equation (3. 1, 2) indicates to us that we can use the super- 
gain ratio, which is a measure of the sensitivity of the power pattern <&{u ) 
to random errors in the defector excitations and positions, as an alternate 
measure of the sensitivity of the SNR to random errors in the detector exci- 
tations and positions. 


An intuitive justification for this is as follows: I {u ) is the power 

<' ‘ - » 

■pattern. Since the signal is incident from direction u^, the power pattern 
is usually designed so as to peak up in the u^ direction, e. g. 



The solid line in Fig 3. 1. 1 represents the theoretical power pattern 
while the dashed line represents the actual pattern we may get due to random 
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errors in, current excitations and detector locations. . Small changes .in the 
power pattern affect the numerator of the SNll much more than the denom- 
inator because the numerator is proportional to the pattern while the de- 
nominator is proportional to the integral of the power pattern over all space, 
which doesn't change >s much. Put another way, if the power pattern changes 
slightly, the main reason for the change in the SNR is because the signal 
power received by the array drops from level A to level B, While the noise 
power received by the array changes, it does not change to as great an ex- 
tent as did the signal power received. Thus our premise is that 

I ' 

A [ SNR ] A [ I ( u) ] 


The super -gain ratio Q is derived in Appendix A and is given by 
equation {A 15) 


I I 


(3. 1.3) 


dn 


I B I 


where B= ^ 

Q is a function of the spacing between detectors through V, and 
through I is also a function of the signal location (or main beam direction) 
and the noise field (i. e. assuming we. use that value of_I which maximizes 
the SNR). 

To investigate how the SNR and Q factor feehave as a function of 
array geometry, we shall focus on the special case of Fig 3. 1 . 2, consist- 
ing of a linear array of four isotropic detectors embedded in a uniform 
noise field (i. e. T ( 9 , tj^) = 1 for o < 0 < , o < <})< 2 ir), whose main 

beam is at broadside ( 6 ^ = 0 ) ot endfire ( 0 ^- o), and whose 

current excitation is given by the optimum value of JL we found in chapter 
two (i. e. that value of_I which maximizes the SNR). 
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Before we can obtain numerical results, we need the explicit form 
of th A matrix in the SNR expression for the case where T (0 , (j>)- = 1 for 

r 

all values of 9 and (j), and of the matrix J VV d in the Q factor 'expres- 
sion. Because of our choice of an isotro;^c noise field, these matrices 
become identical, and, in this case, the elements of A, denoted by 
can be integrated out in closed form for planar arrays of isotropic elements. 
Assuming the detectors are in xy plane, the elements of A are given by 




+j«'k _-}*j 




2tt it • •~i 

If 

o o 


x^ - X . Yu- Yf 

( ^ sin 0 cos (j) +( — ^sin0sin<|) 


sin 9-d0 d '9 

I 3 olo 4 ) 


We may rewrite the integrand by noting the following identity 


Aj^cos <jj -f A^ sin cj) =: Re, 


j <i> j{4’-'n’/2)‘ 


^1 '^'■^2 
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Aje j'l’ » Aj e ^ - 3 e j.'*' = [Aj + j ( -A^) ] e ^ + 




Ai +A^ 


- 1 

j tan -jr — J ^ 

2 . 1 e 

6 © 


1 ^2 

J , j{^-tan j~) 

= fZTTZ/ e ^ 


Taking the real part gives the result 


-1 ^2 


^ COS <j) + A^ sin (j) = ^ A J -^2 


thus 


Z-ir' *rr 


j 2ir sin 0 p, - cos (4>- V ^ ) 


r r J £, 17 s j.ii u p, . VT“ ii^ J < ^ , 

ak«“JJ e sin0d0d«i> 

o o 


C3..1. 5) 


where 


Pkl = 


NJ 


2 / - \ 2 


(3. 1.6) 




= tan 


-1 


^k-yj 


o < 


^ki 


<ir 


(3. 1.7) 


Note that is a multivalued function, and since it appears in the integrand, 
it must be restricted. We will restrict Xj^^ to the range o < <ir. How- 

ever, when we do this, if X^^ appears explicitly in the resulting formula we 
get for a^ we can not use the formula to calculate both and a^ ^ because 

we will not satisfy the requirement that a^^^ = a^ ^ due to the restriction on y. 
The procedure to use is as follows: If y appears in the formula for a^^ , use 
the formula to evaluate a^^^ for k strictly less than and evaluate for 

k > i by computing a'^ k* ^ 7 appear in the formula for (this is 

the result we will obtain in our problem, but we get this only because of the 
particular way we defined V and _I ), there is no problem. In either case, to 
evaluate a , y^^j^ is indeterminate and hence we must evaluate the diagonal 

terms separately. 
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Since 


But 


If k = 


Fig 3. 


A = 


jxcos(^-Y,.) 

zir f ^ d 4) = j^{x) 

o 


1 


TT 


a. . = 2ir f sin 9 J {2ir p, . sin 0) 
ki o *^kf 


d0 


TT 


I 


J (x sin 0) sin 0 d 9 = 2 ^ ■ 

o X 




sin C2 it 


2tt p 


ki 


for k-^ 1 


(3. 1.8) 


(3. 1. 9) 


i we have 


2tt 


kk 


X J J sin0d0d<j)=47r 


(3. 1. 10) 


For the special case of the four element linear array shown in 
1. 2, the elements of the A matrix are given by 


. ■ o ^ . 2ir d 

4tt 2-^ sin ^ 


\ . 4tt d 

d X 


2 X - 6tt d 
_ _ sin -j- 


2 X . 2tt d 4 tt 
“ d®^^ T 


.2 X . 2it d 
d X 


X . 4it d 
d" X 


X . 4-rr d '2 X . 2ir d 
_ sin X 


tir 


2 X . 2tt d 

X 


2 X . Ott d 

X X X 


X . 4ir d 
X 


2 X . 2it d 

X 


4ir 


(3. 1. 11) 
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The optimum (with respect to maximum SNR) value o£_I is given 
by equation (2. 1. 18) 


by 


I - = A" V 

— opt — i 


Using this value of we found in chapter 2 that the SNR is given 


SNR = V* A~^V^ 


(3.: 


Again using this value of t^e Q factor is given by 




Q = 


1 AI 


v^^[A-i]2yi 
W TT 
Xi ^ Yi 


(3. 


If the main beam is at broadside (6^= o) then, in our example 

~ n r- 

1 




j 




j ^-2° 


j 


■ e 


o 

•TT 


1 


(3. 


If the main beam is at endfire (9^= cj)^ = o) then, in our example 


Xl = 


j ^1° ^ 


i (-31T i ) 

e 


e 

j ilJ^o 


jl-T4 ) 

e 


e 

j 


h-4- > 

e 


e 

i 

e 



m 




(3. 


, 12 ) 


1, 13) 


1. 14) 


1 „ 15 ) 
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Similar results can be obtained for the ten element linear array- 
shown below in Fig 3. 1.3 




O ' 
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2 

2 

2 

2 


Fig 3.1.3 Ten element linear arra-y 

The following graphs of SNR and Q vs were obtained for four 

and ten element linear arrays, in isotropic noise, when the main beam was 
at broadside and endfire, using the optimum excitation: 







Fig. 3 ‘ 1. 6 Ten Element Array'- Broadside Signal 
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By comparing Figs, 3, 1.4 and 3. 1-. 6,3. 1, 5 and 3, 1. 7, 3. 1, 8 and 
3. 1. 10, 3. 1. 9 and 3, 1. 11 we see that the general shape of the curves and 
the ratios of the maxima to the minima of each curve is independent of the 
number of elements (four vs ten) in the array. Hence in our future work 
we will only consider four element arrays in order to conserve computer 

time. 

With reference to Figs. 3. 1. 4 and 3. 1. 5 notice that if we use those- 
current excitations which maximize the SNR, the SNR and Q factor that we 
will get when the .signal impinges from broadside can vary between 0. 2 and 
0. 5 ( a ratio of 1:2. 5) and 0. 05 to 0. 15 (a ratio of 1:,3) respectively, depend- 
ing upon what spacing we use between detectors as long as it is greater than 

0. 2X 

Aside: Note that the graphs only cover the region up to d= 1. 8X 

because this is the region of interest to us; however, if we extended, for 
example. Fig 3. 1.4, it looks as follows 


SNR 



and all the other graphs behave similarly. Note also that our graphs don't 
cover the region d=oto d=0. 2X because in this region, mutual coupling 
effects between detectors come, into play, and our analysis does not take this 

into account. 

This means that for this array geometry, when the signal impinges 
from broadside, it is relatively unimportant what spacing between detectors 
we use and furthermore, it is acceptable for us to design the array (i. e„ 
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choose the current excitations or tap weights). by maximizing the SNR alone - 
rather than designing the array by maximizing the SNR subject to a constraint 
on the Q factor - because the Q factor which results from the use of the first 
design procedure will never be excessive. 

However,, with reference to Figs. 3. 1. 8 and 3. 1. 9 notice that if we 
use those current excitations which maximize the SNR, the SNR and Q factor 
we will get when the signal irnpinges from endfire can, vary between 0. 2 and 
1, 0 (a ratio of 1:5) and 0. 06 to a number well exceeding 0. 74 (a ratio very 
much greater than 1;12) respectively, depending upon what spacing we use 
between detectors as long as it is greater than 0. 2 X. . This means that for 
this same array geometry, when the signal impinges from endfire, the ‘ 
spacing between detectors that we use is relatively important, i, e, we, - 
would prefer- to space the detectors as close together as' possible, how- 
ever if we do this, the Q factor, which is a measure of the sensitivity- of 
the SNR to the random fluctuations in the tap weights will be so larg'e as 
to make the array processor useless. 

The conclusion we draw from these graphs is that if we are going to 
use a certain detector array and we are not sure a priori that for all possible 
incident signal directions the Q factor never gets too large when we use those 
current excitations (or tap. weights) which maximize ’the -SNR, we must instead 
use those excitations which maximize the SNR (equation 3. 1. 1) subject to a 
constraint on the super-gain ratio (equation 3. 1. 3). We will see how to find 
these excitations in the next section. . 
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Section 3. 2 Maximization of the SNK subject to a 
gain ratio- 


The problem is to maximize 


i- YiXi I 


I A I 


constraint on the super - 


subject to the constraint 


Q “ 


Appendix B summarizes the work of Lo, Bee, 'hnd Lee 


(19) 


■X X . .. - 

recently developed a numerical technique of s.olving this problem. How- 
ever, their \york yields a (sometimes complex) polynomial equation whose 
roots (when found numerically) can then be used to calculate "the value of_I 
which is the ’solution to the problem. Our contribution rnakes use of a state 
variable technique which enables us to reduce L , Lee and Lee*s numerical 
problem from one of finding the complex roots of a high .order -polynomial 
with complex coefficients (in all the specific numerical cases treated in their 
paper the coefficients of the polynomials were; real, but this is not ‘necessarily 
true in. general and is not true in the second example we will consider in this 
section) to one of finding the eigenvalues of a real matrix, which is consider- 
ably faster to do. 


Since we can only get numerical results, for particular examples, we 
will consider the following two specific problems: 


1. Solve for that value bf^ which -will maximize the SNB subject to the 
constraint Q = .08 for a linear array of -four isotropic detectors spaced 
d = 0.8 \ apart, embedded in a uniform- noise field (T ( 0, <j>) = 1 for. 
o<0<Tr, o<0<2tt), whose mMn beam is at broadside (0^ = o). .From 
Fig 3.1.5 we see that if we did not constrain Q, but instead used that value 
of X which maximized the SNR, we would get a value of Q equal to approx- 
imately 0. 12. 

2 Solve for that value of I which will maximize the SNR subject to the 
constraint Q s . 11 for a linear array of four isotropic detectors spaced 
d =0.4X apart, embedded in a uniform noise field whose main-beam is at- 
endfire (0 = it/ 2, = o). From Fig 3. 1, 9 we see that if we did not constrain 

Q, but instead used that value of _I which maximized the SJ>m, we would get 
a value of Q equal to approximately 0. 18, 


We will use Lo, Lee and Lee's method to do the first example, and 
our method to do the second. As far as the first example is concerned, 

Y ^ = col [l 1 1 l] and we may 
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choose for our complete set (s.ee Appendix 'S) the following- vectors: 


— ^ 




— 


— 

1 


-1 


-1 


-1 

1 


1 


0 


0 

1 

-2 " 

0 

^3 " 

1 

^4 = 

0 

1 


0 


0 


1 



_ - 




- 


(3. 2. 1) 


The W matrix (equation B8) has vectors ^2’ ^3’ ^4 columns, 

where 


W. = 
— 1 


(0. 08A - I) 


a . 
— 1 




2A a. s + A (0.08A 
— 1 


i)“'^ A 


i = 2,3, 4 


(3. -2. 2) 


■ The elements' of this matrix are real polynomials in'S of degree two, 
except for the first column- whose elements are all equal to one.- Setting the 
determinant of this W matrix equal to zero results- in a polynomial of sixth 
degree in s being equal to zero. After solving for the six roots, we take 
the real roots (since we know s is real) and substitute them into equation 
(B'5) to determine the possible values of I, i. e. 


I = 


[ A - s I -F- 0. 08 s A ] 


-1 



“ (3. 2.3) 


We now take these values of I and substitute them into the expressions 
for Q and SNR. The solution -we are looking for is given by the_I which satisfies 


Q = 


I* I 


I A I 


= 0. 08 and -gives the highest value of the SNR = 


1 XiYii 

" JL. I I . . ■ ni— 

• T * 

I ■ A. I 


Numerically, we found the following six roots of the polynomial, the real roots 
being- allowable values of s; corresponding to these four allowable values of s 
we found the values of the Q factor, corresponding to the two values. of s for 
which the Q factor is equal to 0. 08 yje found the two -values of the SNR. ' 


s 

Q 

SNR 

121. 0 -I- j 0. 198 

— 

— 

121. 0 - j 0. 198 

— 

— - - 

-112. 7 

0, 080 

0. 058 

-52. 2 

0. 080 

0. 187 

-61. 8 

0. 070 

Q. 084 

-61. 1 

0.071 

0. 090 



The solution to the first problem, i. e. that value of £ which maximizes 
the SNR subject to the constraint Q = 0. 08 for a broadside array is the value, 
of I corresponding to s = - 52. l60. For this value of s, _I- is given by 


O'. 086 

^ _ - 0. 007 

“ 0. 007 

O". 086 

and the maximum SNR we can achieve subject to the constraint Q = Oi 08 is 

/ , • ' * * * 

SNR = 0. 187. 

The second, example is more complicated, because the vector space 
we ar,e working in consists of complex vectors (e. g. } over a complex 
scalar field (e, go- the scalar r in equation B3). 


Here = 


^ j -3ir(. 4) 
g j -V (• 4} 

^ j IT (. 4) 
g j 3 tt (. 4} 


and we may choose for our complete' set the following vectors 




- - 


^ 



J-3u-(.'4) 

0 


_gj-3TT (. 4) 
. j-lT(.4) 


_^j-3ir(.4) 


— c 





o 


' o . ■ ■ - 


-2 = 


±3 = 

• 



.j3u(.4) 


p 

O ' ! 


gjTrl.4) 
• O , ' 


■O 

gj3'ir{.4) 
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The W matrix (equation 38). has vectors a' , 'W-,- W , ,W, as columns,,' where 

. ‘ “"i ““ Lt 3 * ““ ^ 

- ' -i . .1 * . » ♦ 

W. = (0, llA - I) a,, + 2 A a. s + A (0, llA - 1) “ A a: ; i=2,3,4 [ 3 . 2, 5) 

— 1 ' . —I —1 

* ' * ‘ ■ *' ' ' £ - ' * 

The elernents of this matrix are complex polynomials in s of degree twoi , 
except for the 'first column whose' elements ‘are just complex scalars^ - In • - 
this case, equation (38} can be rewritten iii terms of real and imaginary 
parts as follows (consider a 2x2 W matrix for simplicity): ■ . . . - • 




2It 






*'lr ^ ‘‘li 


‘‘2r ^ j ‘‘2i 


o + j o 


o + j o 


C3. 2. 6) 


This may be rearranged into the following 4x4 matrix equation 


— 



• _ 


— . — 



1 Ir 

-W,,. 

lit 

■^1 1 
Hr 

-W, 

ill 


h. 

Ir 

‘ 


o 

^lli 

^ilr 

W 

1 li 

^llr 


‘ 

“a 


o 

"^2ir 

-■'^21i 

^22r 

-^22i 


Ixm 

2r 


o 

^21i ■ 

^21r 

^22i 

'‘^22r 


..J 
I 


o 


or W H - o (3. 2. 7) 

where the new W matrix and H vector have twice the dimension indicated 
by equation (38.) and are now real. 
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From Appendix B we know that hj^^ — - 1 and h^^ — o, thus the H 
vector is not null, and hence the determinant of the W matrix must vanish. 
Setting the determinant of this W matrix equal to zero results in a polynomial 
of twelfth degree in s being equal to zero. Now we can theoretically pro- 
ceed as before. ■ However the numerical computation of the twelfth degree 
polynomial coefficients is exceedingly time consuming. We will now dem- 
onstrate that instead of having to- form and solve for -the roots of a twelfth 
degree polynomial, we can instead transform the problem into one of find- 
ing the eigenvalues of a l6 xl6 matrix, which is far easier to do numerically. 

We may rewrite equation (3. 2, 7) in the form 

(Aj s^ + s + A^) h - o (3.2.8) 

where A^ and A^ are- 8x8 singular matrices (their first two columns are 
zero), and A^ is an 8x8 invertible matrix,, when, we consider the four 
element array of example two. The problem is to find the twelve values 
of s for which (3. 2. 8) holds. Letting y = ^ and multiplying by A^" gives 

• (y^ Aj"|^ A^ y + Aj"^ A^) h = £ ■ (3. '2. 9) 

In terms of the two state variables - , ■ ' , . . 


x^ = y h 

equation (3. 2. 9) transforms into two first order (in y) equations 

y ~ ^ ^2 

y^2 ” ^ -^3 Hi 1 ■^1 '^Z—Z 

gives 



(3. 2, 10a) 
(3. 2. 10b) 

(3. 2, 11a) 
(3. 2. 11b) 
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Define the 16x1 6 matrix G by 




.(3. 2, 12) 


y X ■= G X 


(3-. 2. 13) 


Thus if s satisfies equation (3. .2. 7), .y = will satisfy equation (3.'2, 13) 
or ■ ' ■ - . 


(G — y I ) X =' o 


(3. 2. 14) 


Therefore, instead of solving for those values of s for which equation 
(3. 2. 7) holds, we may solve for the eigenvalues y = of the matrix G.- This 
is much simpler. 


'Using this appfoachj we found numerically 


1 

y = ~ 

Q 

SNR 

-0. 0457 

0. 0644 

— 

-0. 0457 

0. 0644 

— 

-0. 0463 

0. 063 6 

— 

-0. 0464 

0. 0642 

— 

-0. 0461 

0, 0638 

— 

-0. 0973 

0. no 

0.438 

-0. 0077 

i 

0.110 

0. 009 


The remaining solutions were complex. 
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Th.e best SNR we can get when the Q factor is -constrained to 0, 11, 

■is SNR = 0.438. For this value of ShIR, the complex vector is given by; 

-0. 096 + j 0. 059 
0. 037 - j 0. 100 
0. 037 + j 0. 100 
-0. 096 - j 0. 059 

Thus we have developed a very fast numerical technique to solve 
for the maximum SNR an array processor can achieve subject to a con- 
straint on the super -gain ratio. Our next major problem is to develop 
an adaptive algorithm which will automatically adjust the tap weights of 
our array processor in such a way as to maximize the SNR subject to a 
constraint on the super-gain ratio. For the special cases where we have 
a linear array of four isotropic detectors spaced d = 0. 8\ (d = 0.4X) 
apart, embedded in a uniform noise field, with the signal impinging from 
broadside (endfire), and with Q constrained to be equal to or less than 0. 08 
(0, 11), we expect our adaptive array processor, in the steady state to have 
an output SNR which is equal to {or very close to) 0. I 87 (0.438). We will 
begin considering the_design of adaptive algorithms in the next chapter, ^ 
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Appjgndix A Super -Gain Ratio 

It is. well known that for any given aperture with a sufficiently large 
number of degrees of freedom (e. g. for any given detector array aperture 
with a sufficiently large number of array elements in it), it is possible, in 
theory, to obtain very high, gain by using those excitations which maximize 
the array signal-to-noise ratio (SNR) or some similar quantity. However, 
this high gain. is obtained at the expense of having- a very large super-gain 
ratio (i, e. the sensitivity of the array power pattern, or gain, or SNR to 
small variations in the array excitations and element positions .is very high). 
In practice therefore, since the excitations and element positions can only 
be controlled to within certain tolerances, it is almost impossible to actually 
construct super-gain arrays. To find out how well- we can do in practice, we 
should use those excitations which are derived by maximizing the array SNR 
subject to a constraint on the super-gain ratio., . ■ ... 

In this derivation of the super -gain ratio, taken from Gilbert and 
Morgan, ^^^^we will let the positions of the array elements and the element exci- 
tations vary randomly about their nominal values, with the restriction that, 
the position random displacements have a spherically syminetrical probabil- 
ity distribution. It will then be shown that the expected value of the power 
pattern equals the nominal power pattern plus a background power level. 

The ratio of background power level to the nominal power pattern is directly 
proportional to the super -gain ratio; , . ■ " ” - • ' • : - • > 


Statistical Formulation of the Super-Gain Ratio 

Consider an antenna array of N elements. Each element has the 
same directivity pattern s^ (£q) » where r^^ is a unit vector representing 
some spatial direction, and s (r ^ ) is a complex-valued vector function 
giving the amplitude, phase, and polarization of the radiation field over 
a large sphere centered at the element. For acoustic fields, s (r^) is a 
scalar function. 


The overall array 'directivity pattern is given by 

^ ■ N ‘ 

£■{£0) = £ (£^) Z e 
k" 1 


-i-j k R, • r 
— k —o 


(Al) 
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where J, is the complex excitation (amplitude and phase ) , k is the wave- 

. k . ' ^ 

number, and R, is a position vector from the origin to the location- of the, 
element in the array. As usual for arrays, the pattern may be split 
into the element directivity pattern times the array factor f (£^}whefe 




= Z 


N 

Z 

k= 1 



1-jkE^- 

e 


r 

— o 


(A 2} 


Note that the' electric field E (£^) is proportional to the array directivity 
pattern, i. eo 'the electric ’field strength at a point R r ^ is, for large R,' 
proportional fo ' • '* 

1 tip) H^q) 

R 

Consequently the radiated power in proportional to 
The power diijectivity pattern is- defined as 
Note that for isotropic radiators s_(£^) = 1. 

We will now assume that the excitation coefficients and the positions of 
the elements have some random variations about their mean or nominal 
values. Let ■ ' 


J 


k 




(A 4) 



ik+Pk 


[A 5) 


I 

where I is the nominal value of the excitation current, the a,, s, are 
independent random complex variables with zero mean, r_^ is the nominal 
value of the position vector, the s are independent random vectors 
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I 

with mean (o, o, o), and all the s have the same statistical distribution. 


We can now find the expected values of the field and power patterns 
as follows: 



’ £o 
e 


i 

where p is a random vector having the same distribution as the_p s , and 

f (r ) is the nominal array factor which results when the excitation co-. 
o — o ^ 

efficients and positions equal their nominal values. ‘ - ' 



l(£o> 


(r ) 

— o 


(A 6) 


The norm of the array factor may be written 
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*(£o>l = 1 . J 


£o 
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k= 1 


I 1 *JU 
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N N 

Zy Zj ® 

ksl i = l ^ ^ 






N 


2 (Ij^+ CI^+ a^) 


k= 1 


Taking expected values and recalling that the random variables are independent 
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= z z v: 
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Multiplying through by the power pattern j s_(r ^)| of a single element gives 
the expression for the expected power pattern of the array, namely , ^ 



where the power pattern of the nominal array is 

f- 

Note that in the special case where the positions of the elements are known 
exactly, -implying that the vectors identically zero, the general 

result (A8)' reduces to 



Equation (A9) has a simple physical-interpretation. ’It asserts that the 

expected power pattern is the po-wer pattern of the nominal array, plus a 

"background” power level which has the same dependence on direction as 

the pattern of an individual radiator, and is proportional to the sum of the 

mean-square errors of the excitation coefficients. In order to have the 

over -all pattern be a good approximation to the nominal pattern #^{r^), 

it is necessary to hold the expected value of the background power well- 

below the maximum value of ® (r )•„ 

o — o 

If the displacements are not identically zero, Gilbert and Morgan 


j ‘ £.o| ■ . . . . 

evaluate E |,e ^ by assuming that the statistical distribution of 

p is spherically symmetric, i. e. if we denote the spherical coordinates 
of p by (p, 0, <j>) then the joint probability distribution function p (_p, 0, c|>) 

depends only upon p. In this case the value of E < e > turns out 

to be independent of £^^>- a-n.d we can define' a. para-meter 6^ (independent' 
of u.) by 




e. 




-2 


-,1 


(AID) 


From equations (A8) and (A 10) we obtain the expected power pattern 
for a spherically symmetric distribution of element displacements, namely 


C I + bn Ej ^ (r (r + I s (r j 


N 


( IM^) Y, E 

k= 1 




Again the expected pattern turns out to be the nominal pattern plus 
a background level with the same distribution as the pattern of a single 
element. 

The problem is next idealized somewhat by assuming that the excita- 
tion coefficients can all be controlled to the same relative accuracy, i, e. * 
we suppose there exists a small number 6 such that 
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1.2...., N 


Then (Bll) becomes 




2 ' 2 2 
( 1+6 ) € +5"^ 


N 

Z 

k= 1 


Z lij-" 


This expression includes the effects of both excitation and position errors 

If we define ^ { 1+6 ) e + 6 ] , then the ratio of background 

power level to the average nominal power level is 


N 

° k=i 


N 


*— • 11 


k=l 


f I (r } d n 
Q ° ° 


N ■ 

I I 1 ^ IS ^k® 

k= 1 ^ 


i k r , • r 
*' — k — o 


d n 


For isotropic radiators s (r^)j = Ij so tliat the ratio becomes 


N 

z 

k=l 


A^Z Ukl" 


/ 




N 


_ jkr, • r 

Z Iv 


k“ 1 


d Q 


= A^Q 


where 


z ig" 


Q = 


k= 1 


/ 


N 


■V, 

Alt ® 


k= 1 


d n 


Using the vector notation of section 2. 1 (see equations (2. 1, 1) and 
(2. 1.4)) we may rewrite Q as 


(A12) 


(A13) 


(A 14) 
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Q := 


S2 


(A15) 


Q is a positive real number, known as the super -gain ratio, and 

is a measure of the sensitivity of the pattern to random errors in the ex- 

2 

citations and positions of the array elements. Since in practice A is never 
zero, an array with too large a value of Q is unacceptable. 

Although Q has been derived as a result of statistical considerations, 
it can also be interpreted in terms of the efficiency of the array as an energy 
radiator, if we imagine the array elements to have a certain ohmic resis- 
tance, and the excitation coefficients to correspond to the element currents, 

vL. 

T» 

then I I is a measure of the power which is lost in the form of heat, and 
Q is the ratio of dissipated power to average nominal power. Thus a large 
value of Q corresponds to high ohmic losses for a given amount of radiated 
power. 
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Appendix B Maximization of SNR Subject to a = Constraint 


. "We- will. find the. value of x that maximizes — ^ subject to. 

I. . ^ ’ ' ' • . .X ■■ A X . . - ’ ' 

X- . X 

the constraint ~ = q s a real constant, where A, B, and C are 


* 


X B X 

Herihitian positive definite matrices-, and C'" This appendix ■- 

(19) 

represents work done by Lo, Lees and Bee ' . ; . . . .. 

Introducing a real scalar Lagrange multiplier X,-, the s'olutioh can 
be obtained by differentiating L with respect to x , and setting the result 
equal to zero, where 


L = 


X C X- . 
X A X 


X X 


+ X. 


* „ 

X B X 


{B 1) 


Thus ' 


* 


Gx(xAx)-Ax(x Gx) x (xB x ) X - Bx {x x) X 
6x I — T Z — ~ ~ ^ + 


' (x . A X } . 


(x B X )' 


(^*Ax)/c- (x*C=.)x*'a 


(x’-A x)‘ , , 


X (x Bx)'x ■’ — X (x x) X B 
+ — ~ r - .) 6 X 
(x‘-B x)^ . : 


= o 


Since A, B, and C are Hermitian 


(x^A -6 x) ~ (6 X Ax) 

(X B 6 x) = [6 X B x)' 

(x*C6x )= C6x*Cx)* 



Making this substitution in the second term of the last equation results in 
the second term becoming ' . • 





Gx (x A x) - Ax (x Cx) 

x(x Bx) X - Bx (x x) X 

- 

6 X 

_ 

^ Z 

(x A x)^ 

■*' , ,2 
(x B y) 

ij 


Note that the terms inside the braces are equal to the terms inside the 
braces in the first term of the last equation. Thus, the overall equation 
is of the form 


6 X y 


+ ( 6 X y) ■ = o 


Since this equation must be true for all possible values of the real 
and imaginary parts of 6 x, this implies y = £. • 


Thus 


* 


Cx (x A x) - Ax (x'^G x) X (x B x) X ~ Bx (x x) \ 

~ ~ + 


(x A x)^ 


— ? T 

(x B x)^ 


= o 


(B2) 


But C= can assume x is normalized to 1 j t. e. x x - 1 


X C-x 


because both the function we are maximizing — ^ 


and the constraint 


X A X 


X X 


X B X 


are independent of the magnitude of x. Multiplying equation (B2) 


in the first term, and multiplying the third 


by (x A X ) , letting C = a a, 
and fourth terms by x x = 1, gives 



-69- 


* 

X X. 


since q = 


wfe have 


X B X 

Ax (x C x) 


i)' - 


'{x A x) 


-f X. q‘ (,x^Ax)' X - X. q ( x "^A x} B x '= o 


Combining terms 

{x"C,x) 

(a x) a j = 


(x A x) 
Multiplying by the real scalar 


A — X.q(x^Ax) 1 + X.q (xAx)B 
(x"a x) 


X 


(x^^Cx.) 


gives 




1 ^ 


(x C x) 


^1 = 


(x'^A x)'^' I X )"• 

A - X q : — i* q B 


Define 


j^^x ) (x Ax ) 
r S !c 


(x C X ) 


(x G x) 


a -complex scalar 


(x C x) 


s = 


X q (x A X )' 
(x^^Cx) 


a real scalar 


thus ra^ = [A-sI + qsB].x 


The s'olution for x is 

X s: r K~^ a 

where. r is a complex scalar, depending uponx, and K= [ A-sI+qs B] 
is a Hermitian matrix which also depends upon x. 


X 


(B3) 


(B4) 


(B 5) 
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In addition to equation (B 5), the constraint equation must also be 
satisfied, thus 

«.u 

X X 

q = 

X B X 

Since only the direction of x and not its magnitude (we showed its 
magnitude could be assumed equal to unity) is of interest, the scalar r 
which multiplies all components of x may be disregarded. The only un- 
known,- then, in the simultaneous solution of equations (B5) and(B6) is the 
real scalar s, which'is proportional to the Lagrange multiplier In- 
serting (B5) into (B6) one obtains a characteristic equation for s, 

a K ^ K ^ a ^ ~ 

q 

B K'^a j 

this may be rewritten in the form 

a* K'^ qB a^ - a*^ K"^ IK'^a^ = o 

a* K"^ [ qB - I ] K"^ a j = o- 

Because the unknown s is contained in K, a direct numerical 
solution of (B7) is very difficult. However, Lo,;Lee and Lee observed 
that equation (B7) states that the vector a is orthogonal to the vector 
K“ ^ [ qB l] K"^ a Thus .the vector K^[qB-l]K a^^ must 

lie in the space orthogonal to the space spanned by a A complete set 
{a } with a j as one of its- elements can be easily constructed, e. g. if- 

we may choose 

-1 

1 

= 0 

• 

» 

« 

0 


^3 = 


i-1 

0 

1 

0 

0 

6 


n 


1-1 

0 

0 

0 
9 

0 

1 



(B6) 


(B7) 
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The vector K [qB — l] must be a linear combination 

of the vectors a a^, . . . j a^^. Let it be 

N 

K"HqB-l] K"^a, = T. h a 

— 1 n — n 

n= 2 


which yields 


a 


1 


N 

Tj h K [ qB - I] K a 
^ ^ n ^ — n 


rearranging gives 
N 

Y, [ A + s{qB-I)] '[ qB-l]'^[ A + s(qB-I)] a h -a =o 
n:=2 -n n -1 - 


N 

E 

n= 2 


A(qB-I)”^Aa +Asa +sAa +s^(qB-I)a 
— n — n — n ^ — n 


h - a , = o 
n — 1 — 




N 

n= 2 


s^{qB 


I) a -^2sAa +A(qB-I)" ,Aa 
— n — n ^ — n 


h = o 
n , — 


or -ia^^+h^W^ + h^ + . . . + hj^ = o_ 

in matrix form 


W H = o 


(B8) 


where W is a matrix with in general, complex vectors a , W , , , « i 

2 ~ 1 
as columns, i, e. W =s(qB-I)a+2sAa + A(qB-I) Aa 
— In — n — n — n — n 

u “ 2, 3, • * • , In 



a 


1 
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and 



Since 
must vanish. 


H is not a null vector, 
i. e. 


the determinant of W" in e quation (B8) 


det 





(B 9) 


This results in a (sometimes complex) -polynomial of degree 2(N-1) 
in the unknown s, and thus the roots can be nurnerically determined. Gne 


r' 

X. C :x. 

of them will give the absolute maximum of — ^ : , because once%the 

X Ax 

"^4 * 

possible value of s have been found, the direction of x can be found from 
equation (B^ and the problem is solved. 
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CHAPTER 4 


Minimization of the Mean-ISquared-Error (MSE) 

; 

Subject to One Linear Constraint 


Our objective is to consider an adaptive algorithm which will maximize 

, ' i* V . * ; "a. 

the SNR subject to a constraint on the super-gain ratio when unknown interfer- 

? 

ing noise is present. Because the SNR and super -gain ratio are nonlinear 
^ 

quantities, it is difficult to prove convergence of-'-our algorithm to the optimal 
solution, or to analytically find the algorithm's rate of convergence. Thus, 
for the purpose- of ihathematical tf actability- (the’horiliheaf algorithni will be 
simulated oh a'corhputer to- obtain some numerical indication of cdh-vef'gehce •' 


and convergence rate in chapter sjx), and because -(1-) the criterion of mini- 
mizing the MSE is important in its own right (2) linear- constraints may’ appear 


linear near 


in similar problems (3) nonlinear constraints are approximately 

the solution point and (4j the' projection method used in the linear case is-a^lso 


applicable to the nonlinear case, we will consider 'in this' chapter an- adaptive ' 
algorithm which minimizes the MSE subject to a linear constraint. Specifi- 
cally, we will find the Lagrange solution to the problem of minimizing the 
MSE subject to a linear constraint and then prove that an algorithm of the form 
j = W. - k P V ^ (MSE) converges to the Lagrange solution, when the 

^ — j ’ . ’ • . 

gradient (MSE) is (1) known exactly, '(2) estimated, and (3) .estimated ‘by ' 

“j 

an estimate which contains additive noise. 



Sec.tion 4. 1 Derivation of Mean Squared Error and Constraint Equation 

The processor configuration is shown in Fig 4, 1, 1 where A repre- 
sents a time delay, S-. = col(s.., si., . , . , s ) is the stochastic signal 

— J •* J ‘^J ^ J 

at the outputs of the tapped delay lines at time (iteration) j, the W's are 

the multiplicative tap weights, and d . is some known scalar function of 

1 

the vector s . , i„ e„ d , represents the desired array output at time 
, J , J 





Fig. 4. 1. 1 Processor Configuration 


From Fig 4. 1„ 1 we have 


e . = d . - W . s . 
J J “ -J 


2 2 T T T 

€ . = d . - 2d. s . W+ W s . S . W 

J J J -J ■ ‘ 
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_ -When the input signal can b,e regarded as ,a stationary., -ergodic 
random process, then 

E { s . } = s- and E { d . } = d 

-y “ J 

Our problem is to devise 'an algorithm -that will -adjust th'e= weights 
to their LMS va-lue subject to- a‘ -linear constraint; Toward this end we have-' 
already found 'an expression (equation 4. 1. Z) for the MSE, and the remainder 
of this section will be devoted to finding expressions for the minimum value 
of the MSE when we h'ave no constraint, mention of. .an. adaptive' algo'rithm . . 

I * i 

that will automatically adjust the tap weights to their unconstrained .LMS-. ' 
values, and writing an expression for any arbitrary linear .constraint on 

Taking the expected value of equation (4. 1. 2) gives • 


E { e ^}= = d.^ - 2 (j>'^(s, d) -W. -tW'.'^ci)(s, s)W. 

y J' J- - '“T — J ^ - - “J 


{4. 1.3) 


where 


4>(s, d)= E{s.d.} 
— ~ — J J 


n 


E { s, . d. } 
Ij j 


E{,s f . d. } 

nj ,j J 


(4‘, 1,4) 


*4 ’ 


(}>(s_, £)= E {£ s^.} 

J J 


(4. 1. 5) 


Taking the gradient of e. yields 

3 


■ (e ^ ) = _ 2 (£, d) -F 2 <))’(£, £) 


'.( 4 ; 1 , 6 ) 


To find the least-mean-square (LMS) s.et- of weights, 'W_ . -g., that 


minimizes € • when there is no constraint, we set V(e. )=: o. Thus 

■ ■ . • J . . - 
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* (S, d) = <!>(£. £)■ 


iSiLs = 1> 


(4, 1.7.a) 
(4, 1. 7b) 


The LiMS error is achieved by ■choosing the optimal weight vector given 
by equation (4. 1. 7b). An expression for the minimum mean-square error 
may be obtained by substituting (4. 1. 7a) into (4. 1.3) 


r 2 = 

'min 


min.(e.^) = d.^ 

J J 


-LMS -LMS 


(4. 1.8) 


2 . . 2 

Note that min { ) is independent of j ( d. is independent of j ). 

(IZ)— ( 18 ) 

Widrow, Lucky and others ^ ^ ' 'have investigated adaptive 

algorithms which automatically adjust the tap weights to their uncon- 
strained LMS values. One such algorithm is given by 


W. , , = W. k V (e .^ ) 

-J+1 -J J 


(4. 1. 9) 


Substituting (4. 1. 6) into (4. 1. 9) gives 


W. = W.42k(j>{s, d)-2k(j>(s, s) W. 
-J + 1 -J ' - - ^ “ ~ ~J 


(4. 1, 10) 


Note that equation (4. 1. 10) is a linear equation in W. This means we can 
easily solve for lim W. and other quantities .'of interest, and it is the main 

j— 00 J 

reason we are using minimum mean-square error as our criterion. The 
abovementioned researchers have proven that by using the algorithm of 
equation (4. 1. 9) converges to 

Any arbitrary linear constraint on W can be written in the form 


T 

W 7 n 


a > o 


(4. 1, 11) 
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where n is a unit normal to the hyperplane W . n — a = o. 

X — — ^ 

Our problem now is to (1) find the optimum value of the 
weights, which yields the minimum MSE (equation 4. 1.2) subject 

to the constraint (4,1, 11) and (2)jdevi'se an adaptive algorithm, similar to 
(4. 1, 9) which will make the tap weights W converge to this The next 

section attacks the first problem. 


Section 4. 2 Analytic (Lagrange) Solution 


In this section we will use a Lagrange multiplier technique to find 
the optimum value of the weights ^Qp^-j which yields the minimum mean- 
square-error subject to the linear' constraint (4. 1. 11). 


Let us first rewrite equation (4. 1.3) for e . as follows. 

J 

Substituting (4. 1. 7a) and (4. 1, 8) into (4. 1. 3) gives 



min 




- 2 <(>(£, s ) W+W,LCi.£)W 


But 




Wlms 


Thus 


e. 

J 


= e 


mm 




LMS 


(4. 2. 1) 


The problem is to maximize (4, 2, 1) subject to (4. 1. 11), Let us 
investigate what the solution looks like both graphically and analytically. 
Graphically we have 





Fig, 4. 2. 1 Typical MSE level curves and constraint 


Since the objective function is quadratic, the solution is either:. 
1. W = or 

2o W = the solution to the Lagrange multiplier problem 
“ T 

when (4. 1.11) holds as an equality, i. e. W . n^^ — a = o. 

We are only interested in case (2) in this section, because the 
algorithms of Widrow and Lucky will work in case (1). 

Analytically we must minimize 
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Taking differentials with respect to W we have 

6L = {6 ^ W+ ^ (5 W) - (SW'^) ^ - W^g ^ (6 W) 


+ a{6W ) i= 6 


(4.2.3) 


But 


46W^) L<j>W^.(j.W,j^j^g].{ = I cf)] (6W) 


-LMS 


(4„ 2. 3) may be rewritten as 



+ 2 [ w'^cj) 


W 


LMS 




( 6 ^ = 0 


Which must be true for all 6 W, giving 

an^"^ + 2 Iw’^tf) - W^j^g cj)] = o (4.2.4) 

equation (4. 2. 4) together with the constraint equation (4. 2. 1) must be 
solved simultaneously for a and W. Doing this yields 


W 


optimum 


Si> 

(a^^ (j)"^n ^) 


T.-l . T 
n . {jj + W, 


-1 


-LMS 


(4. 2. 5) 


This is the analytic solution for the least mean square value of the 
tap weights subject to an arbitrary linear constraint. In the next section 
we will present an adaptive algorithm, which will, in the steady state, make 
the tap weights converge to this optimum value we have just found in equation 
(4, 2. 5) . 
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Section 4, 3 Use o£ the Projected Gradient Algorithm to Adaptively 
Adjust the Tap Weights 

The projected gradient algorithm that we will use is a modified 
version of Rosen's algorithm which is discus sed'djriefly in Appendix B, 

It is advisable to read Appendix B before the following sections. The 
algorithm we will use to minimize the MSE subject to a linear constraint 
may be thought of intuitively as follows; We want to converge to the vector 
Wopt which minimizes the MSE, which is a function of W,subject to a linear 
constraint on the vector W. Looking at Fig 4. 3..1 we see intuitively that 



we can start at a point which satisfies the linear constraint, denote it by 
point one; fine the gradient of the MSE with respect to W at point one and 
"project" this gradient vector, which lies in an n dimensional vector 
space (in Fig 4, 3, 1 the n dimensional W vector space is of dimension 2), 
onto the n- 1 (one dimensional in the diagram) dimensional subspace 
which is orthogonal to the one dimensional subspace spanned by the normal 
n to the constraint surface, call this point two; and repeat the procedure 
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indefinitely. This procedure may converge to the constrained optimum 

denoted by W . under certain conditions, , , ’ 

^ -opt - • • 

Analytically, the .projected gradient algorithm is given by 

. ■ W. ■ , W. - k,P,V_ 

j + 1 — j '■ W. (MSE) 

■ ■ ■ 

■ ’ ■ ’ ■ T • ’ 

where P is the projection operator P=I — Uj^n^ if we have only one' con- 
straint (see Appendix B for the more general case) , n is- a unit vector ” 
normal to the constraint hyperplane, k is a constant which will -be investi- 
gated later, and (MSE) is the gradient of MSE at time (iteration) j. 

~j , - . 

Section 4, 3, 1 The^ Algorithm, Proof of Convergence, and Bounds on 
the Rate of Convergence if the Gradient is Known. 

Let us. compute the gradient of the MSE, g, and the gradient pro- 
jection Pg. . From equation (4. 1.6) 

g’^= vj( 17 ")'= - 2cj,‘^-(s, d)-+ 2 w'^ <j>{s, s) 

— WJ — — — — 

using (4„ K 7a) -we get 

g = 2 <j>i W- (4,3, 1.1) 

The projection operator is given by 

P = (4, 3, 1, 2) 

thus 

Pg = [r-nin^'^] 2 cj, [ W- W^j^g] (4.3,1, 3) 

\Our algorithm is 

Wj+i' •= k 2c|)t W. - (4, 3.1, 4) 

. As discussed before, we’will start at a point where the constraint, 
is satisfied, and. since at every- iteration we are projecting W onto a sub- 
space where- the constraint is satisfied, this implies that the constraint 
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equation is always, satisfied, i. e. 

n ^ = a j = 0, 1, 2, . . . 

Equations (4.3. 1.4) constitute a set of n simultaneous first order 
difference equations. In order to solve them, we need initial conditial 
conditions. For our "initial" conditions, we will use the fact that the con- 
straint must always be satisfied, and in particular must be satisfied at 
j = CO, i. e. 


W 


-00 


-1 “ ^ 


(4.3. U 5) 


Now equations (4. 3. 1.4) and (4.3. 1. 5) constitute a set of n first 
order deterministic difference equations (since W is of dimension n) with 
initial conditions. We want to investigate whether or not the sequence' of 
W’s converges to , and if so, what is the rate of convergence? 

To answer the first question, we will solve for the asymptotic value 
of equation (4.3. 1.4) 

o= -Wj 


Let 


X = W - W, - 
— — oo — LMS 


(4.3. 1. 6) 


then 


[l — n^^Ui ] <j)Xs:o^ 


(4.3. 1.7) 


Again, since W has n components, equations (4.3. 1.7) constitute 
a set of n simultaneous deterministic homogeneous equations in n unknowns. 
The initial condition (4,3. 1. 5) becomes 


T T 

Hi * X = a - n ^ 


W 


LMS 


(4,3. 1.8) 
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Before solving (4.3. 1. 7) let us consider the following equations. 

A X = o 

1. A necessary arid sufficient condition for the above n equations to 
have a nontrivial solution is that the rank of A be less than n, or equiv- 
alently, that the determinant of A be zero. 

Z. If the rank of A is r, where r < n, then the system of equations 
has exactly n - r linearly independent solutions such that every solution 
is a linear combination of these n-r linearly independent solutions and every 
linear combination of the n-r linearly independent solutions is a solution. 

Let us now investigate the rank of [ I - n ^n ^ ] (|). By definition, 
the rank of an operator is the dimension of the range space of the operator, 
thus 


T 

•ank [ I — n n ] = n — 


For arbitrary matrices B and C 


rank (BC) < min (rank B, rank C) 


From, this we may 'conclude that 

r Tn 

1. Because rank[I~nj^nj J = n - 1 , this implies there exists at 
least one (possibly nonunique) solution to equations (4.3. 1. 7) . 

T 

2. If we know that the rank of[l-n^n^ ]cj) equals n - 1 , this implies 
there exists a unique (to within a multiplicative constant -which is unique 
provided the initial condition is satisfied) nontrivial solution to equations 
(4.3. 1. 7)o 

T T 

If (J) is invertible , then the rank of[l-n,n ]4> = rank [ I - n ,n ] 

' • ' ' ( 23 ) — 11 - — 1—1 
= n— 1, This follows from Halmos, ' rheoremS, partIV, page 92. Since cj) 

is a correlation matrix, it is positive semidefinite, and, in practice almost 

always positive definite, which implies that it is invertible. Thus equations 

(4.3. 1. 7), together with the initial conditions of equations (4.3. 1.8) have a 

unique solution. 



1^ 1^ 
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If W = W optimum satisfies (4.3,1. 7) and (4.3.1. 8) then it is the 


00 — 


solution. We will now verify that this is the case. From (4. 2. 5) 


W = W ^ 
— 00 —opt 


T -1 


1 + -LMS 


X = W - W_ , 

_ _oo — L(MS 


— 4> n , 

T - 1 


Substituting this expression for x into (4. 3. 1. 7) and (4. 3. 1.8) one sees that 
the equations are satisfied. Thus W^ = W^^^ is the unique solution to equa- 
tions (4.3. 1.7) and (4, 3, 1.8). 


opt’ 
opt ’ 


Now that we have shown that the sequence of W*s does converge to 
we will investigate the rate of convergence of the weight vectors to 
given by (4. 2. 5) 



£l^ \-l ' 

^ <t> 

(ni^ n^) 


+ W 


LMS 


Define 


q . = W. - W ^ 
2 J - -J -opt 


The algorithm (4.3. 1.4) can be rewritten as 


W 

-j+1 


(l- 2 k 4)) I+2k n ^n ^ t|> 


W. + 2k 
-J 


^-£i£i 


(j) w. 


-LMS 


-f 


(4,3, 1. 9) 
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T - 1 

After some manipulation (and 'noting that L I '' J ‘ ~ o^‘-)'we have 


^j+1 


I- 2k (I - n^n^^) ^ 


-j:- 


(4.3. 1. 10) 


Since' — ^Qp^» tiy looking at Fig. 4. 3'. l.we see that always lies 
■ in. the hyperplane (in the Figure this means lie along the constraint line) 
which is orthogonal to hence 


^ Ij - Ij j 


(4.3.1.11> 


Thus 




(4.3.1. 12) 


and- 


llaj,.! Ila llao I 


(4.3.1. 13) 


where 


(l-2kcf))| 


(4. 3-, 1.14) 


Let us investigate this norm. ^ The correlation matrix <}> is a 
symmetf ic~positive'‘Semidefinite, and in practice almost ^always positive 
definite, matrix with positive minimum and maximum eigenvalues p ^ and' ' ' 
respectively; k is chosen to be a positive number; and ~2:i *^) 

a projection operator as discussed previously. To bound the norm, we have 


6 < i 1 I “,£1 


I'- 2k ^ 


(4.3.1.15) 


1 1 

Since is a. projection operator, its norm is 1, thus 


e< II I - Ek(j) j| = . 
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Since (I - 2k 4>) is self-adjoint (see Halmos^^^ ^page 18,0 and Goidstein^^^^ 
page 24) we may bound as follows 

■ 1 

T 


• ^ = sup lx ( I - 2k c|)) X 

-l|x|l = l 

Since (j) is symmetric positive definite 


(4.3.1.16) 


2k pj £ 2k x-^ «j) X < 2k p 


N 


(4.3.1. 17) 


where p^ and are the minimum and maximum eigenvalues of. <|) 
respectively, and |j x |j = 1. This implies' 


and 


thus 


and 


sup 

|x|J = l 


1 - 2k x"^ (}) X > 

1 — 2k Pj^ 

T 

1 — 2k X (|) X < 

1 - 2k p 



1 - 2k pj^< l-2kx"^ (jjx < 

1 

‘x^ (I- 2k <j>) X 

'max , . 

( 

• 


(4.3.1.18a) 


(4.3.1. 18b) 


(4.3.1,19) 


l-2kp^|, |l~2kpj^! > (4.3.1.20) 


Thus 


^ ~ j I 1 - 2k p^l , I 1 - 2k p J j 


(4.3.1, 21) 


If we plot, on ^ vs k axes, the two curves ^ ~ [ l-2kpj| and ^ = | l-2kpp^j 
we have t 



fa 

2 

Fig. 4. 3. 2 


2 

^ vs. k 
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A plot of ^ max I ] l-2kpj|, | l-2k pj^| I looks like 

' e • j . ■ 



Fig. 4. 3. 3 Bounds on k 
° max 


The maximum value of k that still insures convergence {k ) 

° ' max' 


is found by setting 


— { l-2k Pj^) ^ 1 which implies 


‘ k < (4.3.1. 22} 

Thus in this section we have proven that our algorithm converges 
to 1*" sufficiently small. In the next section we will investigate a 

more useful algorithm, i, e. an algorithm which does not require a priori 
knowledge of cjj . 


Section 4.3.2 The Algorithm, Proof of Convergence, and Bounds on 

the Rate of.'-Conver.gence if the Gradient is Estimated 


In practice," the mean-square error e. is normally not -available, 

' J 2 ■ 

There are various methods available for estimating e. Here we wil-1 

’ / 2 .’ ' 2 ' . " ^ , 
assume the simplest- estimate e. ft} 6.. , i„ e. ..we are approximating the 

* J J ’ 

2 , , 

average value of- e by its instantaneous value, which, is, normally avail- 

th ' - 

able. Thus the i — component of the gradient is approximately given by 

th 2 

the i — partial derivative of with respect to W.. ■ ‘ 
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8e.^ 3e.^ ' 9 

J • J _ 2e L. 

9w. ~ aw. i aw. 

11 1 


From equation {4, 1. 1) 
de. 

J ^ _s 

aw. ■ ij 


thus 


V(5.^ )«V «.^= -2S. s. 


(4.3. 2. 1) 


We will now use this estimated gradient g in our algorithm yielding 
P| = - 2'jij 


W-, . = w. + 2 k [ I- n n ] s.. e. 

— J+1 — J —1 — 1 — J J 


using equation (4. 1.1) 



W^+ 2k[ I-n^n,] 




T 

• W.) 
-J 


(4.3,2. 2) 


The "initial" condition is 




(4.3, 2.3) 


W. is now a random vector, and equations' (4. 3,"2. 2) represent a set of 

first-order stochastic difference equations, with forcing stochastic 

vector s .0 . . 

J 

Let us see what the asymptotic expected value of Wj Is: 

E {W. ,} = E{W.) + 2k[l-n n'^] [cj) (^, d) - <j)(s_, s_) E { W }' ] 
— j+1 — J — I i . J 
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because 


E { 




E { 


■J“J 


} E 


J 


i„ e. Wj depends upon s ^ , . . . , s ^ but is independent of s ^ . 


From equation (4. 1.7a) d) = s_) Wj 


^ + = E {W. } + 2 k[ I-n^nj'^I [ W^j^g - E{-W.} ] 


Taking the expected value of (4. 3. 2. 3) yields 


E {W^} £1 = a 


Equations (4„3, 2. 4) and {4.3. 2. 5) constitute a set. of first order deter- 
ministic difference equations, exactly the same as equations (4.3. 1.4) 

T 

and (4.3. 1. 5). Thus the solution {unique since rank (I-n )(j>=:n-l) is 


E { W^} W 


optimum 


/ T .-1 . 


+ ■‘2.1 + ^LMS 


e have shown that the mean of converges to However, since 

equations (4.3. 2. 2) are stochastic, we must also investigate the behavior 
of the variance of the random weight vector about its expected asymp- 
totic value, given by E { W^} = ^ W^. 


(4.3. 2.4) 


(4.3. 2. 5) 


(4.3. 2.6} 


Let q . 
-J 


W. - 
“J -«3 


(4.3. 2. 7) 
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In terms of q , the algorithm (4i 3. 2. 2) becomes 


lj + 1 = q. - k 2[ s^.s."^ q. 


-k2[l-n^n^.^I s.s.T 


+ k 2 [ I-n ^ n s ^ 


Define T . = 2 s . s 

J -J-J 


V. K 2 d . s . 
-J " J -J 


(4. 

(4. 


Hj= ) T. 


thus 


(4. 


^j + 1 


= q.— k [l — n, 
-J 


n,^] [ T. q. + T . W -V.] 
1 J -J J -«> “J 


(4. 


This may be rewritten as 




(4. 


where 


CO. = H . q . + h . 

3 -J -3 

hj = Ci-aiEi'^) IT. w^-v.) 

Note that E { H .} and E { h . } are independent of j« Also H and h . 

3 ~J . . J J 

are statistically independent of and h if j k, because we assumed 

that s . and s, are statistically independent for k ^ j. 

-J „k 


(4. 

(4. 


E { T . } 2 cj) (s_, £) 

J 


3, 2.8) 
3. 2.9) 

3. 2. 10) 
3. 2. 11) 

3 . 2 . 12 ) 

3. 2. 13) 
3. 2. 14) 


Noting that 
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and 

E {V"} = 2^(s, d) = 2>{si s) 

** * 

t , 

it is easily shown that 

C4.3„ 2, 15) 
(4.3. 2. 16) 

where • “ H . q. . + h 

J ~J “3 


' E {h. } = o ' 

l3 r 

rp 

Note that E{H.} = (£,> £.) = ^ 

J i i 

The algorithm is thus 


H . is a sequence of random n x n matrices; h^ is a sequence of random 
n-tupie vectors; the expected values of Hj and h^ were shown to be in- 
dependent of j ; H . and h ^ are independent of and h^- for I f y. E{h^} 

=' o; and' the elements of H. and h. have finite variance, with E{H } .= (i 
— 3 “3 ^ 


Under these conditions, it is shown in appendix A that for k siiffici- 
ently small, 


lim J I E { q .} 

j— 00 ^ 


= o 


(4.3.2. 17] 


and 


lim sup I 1 q n ^ V (k) 
. 3—00 ^ 


(4.3. 2. 18; 


where the norm of a random vector u is defined as 


u 


E {u'^u} 


and 


lim V (k) = o 
k— ►o 


(4.3. 2, 19 
(4.3. 2. 20 
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Equation (4,3. 2. 17) shows again that the weights converge to 
and equation (4. 3. 2, 18) shows that the variance of the random weight 
vector about its expected value is bounded, and the bound can be made as 
small as desired by choosing k sxafficiently small. 

The rate of convergence of the mean of the weight vector is shown 
in the proof of the above theorem to be bounded by where 

T-, 


4 s: I] I~-k 245 


(4.3. 2. 21) 


and o < I < 1 as shown' in section 4,3, 1. 

Section 4. 3.3 The Algorithm, proof of Convergence, and Bounds on the 
Rate of Convergence if the Gradient is Estimated, and 
the Estimate is Noisy. 

When our estimated gradient contains noise, wherever wc have 

the quantity s . in section'4, 3. 2 we replace it by + n .. To characterize 
““J J y 

the noise we will assume 

E{n.} -o, E{n.n.'^} = c|) ^ , and £., s_j^ , n^ , n^ are statistically 
J J J ^ J 

independent for k f j and n f m. 

The algorithm becomes 


W 


j + 


J . W, + 2 K[l-njn jT] (s . + ) [ dj - (i/ + eJ > 


(4.3.3. i) 


with the initial condition the same as before, i. e. 


W n = a 
— c» — 1 


(4.3.3. 2) 


Equations (4, 3.3. 1) represent a set of first-order stochastic difference 

equations, with forcing stochastic vectors s_. and n . 

J J 

Let's find the asymptotic expected value of W^ 

E { W^^ = E{ Wj} + 2k[l-n ^n [^(s_, d) - cj)(s_, £)E{Wj } -<|)^E{Wj }j 
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Using (4. 1. 7a) and setting E{'W. ^} ~E{ W, } = .gives . 

° [ I - £ 1 S l"" ], 4>,[ ( I i „) W „ - ] 

Taking the expected value of (4‘. 3. 3. 2) yields 



Define x" ‘‘( I + 4>- 


,-l 




W ~ W 
-oo -LMS 


then 



c|) X 


o 


By the previous arguments, a solution to (4, 3. 3. 6) exists and is unique 
because 


r Ti 

rankLl-njii]^ J 

Equation (4. 3.3. 6) is the same as equation {4. 3. 1.7), thus the solution 
is given by 

X = a 9 n 


where the value of a is chosen so as to satisfy Uie initial condition, 
given by equation (4. 3. 3. 4), i. e„ 




a = 


n (I + 4> ^ V 


1 


The solution ior W is thus 
— 00 




-1 


'■ W = (I + cj>"% )'^ 
— 00 n 




nj (i + 


-1 


(4,3. 3, 3) 

n i 

(4.3. 3. 4) 
(4. 3, 3. 5) 

(4.3.3. 6) 


(4. 3,3,7) 
(4, 3, 3, 8) 


(4.3.3, 9) 
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Remembering that 


.( a — n 


W 


opt 


W • + 
-LMS ^ 


1 


*LMS> 




^ n 


we see that W differs from W . in this case, i. e. a bias exists, and 
— 00 — opt 

the bias approaches zero as the noise matrix approaches the zero 
matrix. 

Again, since the weight vectors are random, before we can con- 
clude that the V^eight vectors converge to ^optimum* examine 

the variations of the weight vectors about their asymptotic expected value, 
given by {4. 3. 3. 8) 


Define q.S W. - W 

-J -J “<» 


( 4.3 


In terms of q , the algorithm (4,3. 3. 1) becomes 


9 .. 




= q j - k 2 [ I - n ^ n ] ( £_j + n j ) (ij + 


- k2 [ I - n^n (£j + 


+ ;.k2 [ I-n^n-j^ - ] d^ (£^ + n^ ) 


Define 


T. = 2(s . + n.) (s .^ + n."^) 

J -1 -J “J - -J 


V.S 2d.(s.+n.) 
-J 3 -J -J 


H = (I - n n / ) T . 
J -1 -1 ' J 


( 4.3 

( 4.3 

( 4.3 


q. =.q. - k [. I-.n., n,"^] [ T. q. + T. W _ V.] 
+ 1 -1 ~1 J -J . J -00 -J-' 


. 3. 10} 


. 3. 11) 
.3. 12) 
. 3. 13) 


■ (4.3.3.14) 
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This may be rewritten as 

q , = q . — k <p. (4, 3„ 3, 15) 

-j + 1 -J 

where (p,= H.q.+h. (4;3o3»l6) 

j -J -J 

h.S (I_ n, n ) (T.ir -V.) (4.3.3.17) 

-j -1 -1 ' ' j -00 -j 

Note that E { H. } and E {h. } are independent of j. Also H. and h. are 
J . “J . . , 

statistically independent of and h^^ if j f k because we assumed • 

s , s , n . , n are statistically independent for k and n # m. 

— J — k —i — m 

Noting that 

E } = 2 [cj) + 

E {V. } = 2^(s, d) = 2 cj, 

it is easy to show that 

E {h. } = o 
-J 

Also E {H. } = 2 (I-n n "^ ) {(j) + 4) JS ^ 

J 1 X 

By the same argument as before, we may show that for k 
sufficiently small, 

lim II E {q.} 11=0 (4.3.3.20) 

j— J 

and 

lim sup II q. ]|— V (k) (4.3.3.21) 

j— o 

This proves convergence. 

Again the rate of convergence is bounded by which depends ^upon 
k, the eigenvalues of ( cji + 4^^), and the constraint. 


(4.3.3. 18) 
(4.3.3, 19) 
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Section 4, 4 Simulation Results 

As a check on the theoretical work we have done in this chapter, 
we programmed the following algorithms on IBM 360/50 in Fortran IV. 

Let us first consider the algorithm given by equation {4.3. 1.4) 
where the gradient is assumed to be known. We let the dimension of 
the vector W be four. 

Let 


d .. 
J 


= [ 1111 ] 


'Ij 

'•2j 

'3j 


(4.4. 1} 


0{s , s )S 


1 ; 0 0 0 

0 2 0 0 

0 0 3 0 

0 0 0 4 


(4. 4. 2) 


L J 

i. e„ all components of the vector s . are assumed to be gaussian, zero 

1 

mean, and uncorrelated. 


Thus 


4>~^{s , s ) 


2 

0 

0 


0 

0 

1 . 

3 


0 

0 

0 

1 . 

4 


1 


and 


^ d ) = 





E {s2-d^ } 


2 

E {s_ .d. 

3 j j > 


3 

{S4j-d.}_ 


4 


(4. 4.3) 


(4.4.4) 
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The IMS value of the- weights is given b-y 


W 

-LMS 


,-l 




1 

1 

1 

LIJ 


(4. 4. 5) 


For our constraint we let 


^1 


1 


/2 

_1 

/2 

0 

0 


and a = 3 


(4.4.6) 


i. e. the linear constraint equation is 


Wi - 3 VT 


which means that there are no constjcaints on and W^. 

'■3/rl 


For our initial conditions we 
considered two cases: 


W = 
— o 


or W = 
~o 


which exactl-y satisfy the constraint. 
The Lagrange solution is 


W . 
-opt 


10+3-Tz~ 

10 

0 

0 


1 + z/z" 


3. 82 

1 - z/z' 

= 

-0.41 

1 


1. 0 

1 


1.0 

— 


M — 


A limit on the values of k which insure convergence is, from 
equation (4.3. 1. 22), given by those values of k for which 1, i. e. 


(4.4. 7) 
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k< X 


The algorithm is (see equation 4,3. 1. 4). 


W 




(4,4.8) 


where 


W 
— o 


3 V2 

o' 

0 

0 


W 
— o 


10 + 3/2 
10 
0 
■0 


Using -the values we have chosen for n ^ and (j), the algorithm 
may be rewritten as 


W, . + 2 W- 3 
1 J 2 j 


w = w; - k 

~r}+l -J 


W, . + 2 W- . - 3 
Ij 2j 

6 - 6 

8 W . . - 8 
• 4j 


In the steady state, W should converge to 
and the asymptotic MSE should be given by 


3, 82 
■ 0:4T 
1 . 0 
1.0 


(4, 4, 9) 
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E {€.^} = d.^ d)W + 

J J 


evaluated at W = W ^ W . which is 

“*“ oo *^*opt 


>, 


= 12 . 0 


W = W = W ^ 
— —00 —opt 


(4.4i 10) 


We ran. the above algorithm for various values of k, with the 

initial condition W = col [ 10 + 3 V"2, 10, 0, O] and the results are 
— o 

shown in Fig 4.4. 1. Note that as k increased from 0. 01 to 0. 25 (above 
which we no longer have convergence, theoretically or in the simulation, 
as demonstrated by Fig 4. 4. 1 when we let k=0, 252) the rate of conver- 
gence agrees with the bound given by Fig. 4.3.3. Fig. 4. 4. 2. shows 
how the norm of the vector q (see equation 4. 3. 1. 9) converges to 
zero for various values of k. From this graph we can compare 
the actual time constant, for a particular value of k, to the theoretical 
bound on the time constant ( ^ ) , e. g. for k = 0. 01 , j q | decreased from 
1 5.8 2 to 12.26 in ten iterations. Setting 12.26= (15. 82) implies ^ = o.975 

which is in agreement with Fig. 4. 3.3, which bounds the rate of conver- 
gence for this value of k by 1-2 k p = 0, 98. 



MSE 
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|q| 



Fig. 4. 4, 2 Gradient Known , No Additive Noise 
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, If the gradient must be estimated from the incoming data, the 
algorithm (see equation 4. 3, 2, 2) is 

' ‘ W.,, =r W. + 2k [ I-nm,’^] S'. [ d. - s . W. ] 

' i ■ 

Using -the values we have chosen for n and d . the algorithm may be 
I i J 

1 


rewiritten in the form 
. > ’ 1 


W. w. + k 

, “j+i . -j 


(s. , 4 s..,) u. 

J 1 J 2' j 

(s. , + s. .,) u. 

J 1 J 2' J 

2S3U. 

2=^u. 


( 4 . 


(4. 


. r ■ \ 

.'where_u.-= (s.,^s._ + s..,+ s..) - (s.,'W.,+ s._W.J + s_.W..,+s..'W..) 
. 1 1 1 J 2 J 3 J 4' ' J 1 J 1 J 2 j 2 3j j3 j4 j4' 


( 4 . 


'M*'* ' ' * 

' In the steady-state, W. should converge to the same values 

as beior'e, and the asymptotic MSE should be 12. 0^ 


t 

\ 


The results of the simulation for k = 0, 01 and W 

— o 


are' shown in Figs, 4. 4. 3 and 4, 4, 4 and agree with the 
theor'etibal values above. 

f. * ' 1 


10 + 3/2 
10 
0 
0 


4. 11) 


4o 12) 


4. 13) 



350 , 00 1 


300.00 


250.00 


200.00 


50.00 


100.00 


50.00 




ITERATION NO. 

Fig. 4. 4, 3 Gradient Estimated , No Additive Noise 



-103- 







-105- 


Finally if the gradient must be estimated from the incoming data, 
and the incoming data is noisy, the algorithm (see equation 4, 3,3. 1) 
becomes 



W. + 2 k [ I 


n j^n 




+ n . ) [ d . 
--J J 



+ n."^) W. ] 
-J 


(4. 4. 14) 


Using our specific values for the above quantities, the algorithm may 
be rewritten as 


— j + 1 


W. + k u . 
-3 3 


L 


s., +n. , + s.„ + n. _ 
J 1 J 1 J 2 j 2 

s . 1 + n. 1 + s. 2 + n. 2 

2(s.3+n.3) ■ 


where 


u. = d. - (s 7 + n."^ ) W. 
J J -J -3 -3 


Let the noise correlation matrix be 


(b = 0. 1 I 
^n 


In this case, see equation (4. 3,3.9), the average asymptotic value 

W should be 
— 00 

3.72 

_ -0.515 

^00 “ 0.967 

0. 975 


(4,4. 15) 


(4.4, 16) 


(4,4. 17) 


(4. 4. 18) 
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and the asymptotic MSE should be « 11.9. The results of this 

simulation, for k=0. 01 and W = 

— o 

and 4. 4. 6 

Fig. 4. 4. 7 - 4. 4. 10 indicate how the convergence rate and 
asymptotic MSE change as the additive noise in the incoming data increases 
Figs. 4. 4. 7 and 4. 4. 8 correspond 'to (J^ — l.OI, = col [3.17 -1.12 

0, 747 0. 80] , and asymptotic MSE 14. 0. Figs. 4. 4. 9 and 4, 4. 10 corres- 

pond to ^ = 10.0 1, W^ = col [ 2.34 -1.90 0. 28 0.29], and asymptotic 
MSE ftJ 28. , • 

Comparing Figs. 4. 4. 5, 4. 4. 7, 4. 4. 9, and 4. 4. 3 we see that it took 
longer to converge when we had additive noise than when we did not have 
additive noise in the incoming data. 

In Figs. 4. 4.11 and 4.4.12we kept everything the same as in Figs. 

4. 4, 5 and 4. 4. 6 except that we started at = col [ 3 V”2, 0, 0, O] which 
is much closer to the steady-state value, arid expanded the vertical 

scale. From these figures we notice that the MSE is somewhat sensitive 
to the occasional noise sample whose value is greater than three or four 
standard deviations away from the mean value of the noise which in our 
case is zero. This suggests that one might achieve a smaller value for 
the steady state variance if the algorithm were 


10 + 3/2 
10 
0 
0 


are shown in Figs. 4.4, 5 


W.^. = W. - k f [ V (MSE) ] 

— j + i — j 


where f [ V (MSE) ] 



V ( MSE ) if V (MSE ) < 


if V (MSE ) > 


K 

o 


K 

o 


However, this approach was not investigated further. 







MSE 



4. 4. 7 Gradient Estimated , Plus Additive Noise 
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q 



Fig. 4.4.8 Gradient Estimated , Plus Additive Noise 
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700 


800 


900 


ITERATION NO. 


Fig. 4.4.9 Gradient Estimated , Plus Additive Noise 




! 'i 

30.00 


25.00 


20.00 


5.00 


0.00 


5.00 






MSE 



Fig. 4. 4, 1 1 Gradient Estimated, Plus Additive Noise 
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Appendix A Proof of Convergence and Bounds' on the- Asyrriptotic 
Variance. 

; This theorem is essentially the same as Appendix C of Gersho's^^^^ 

paper, 

* . '1 ‘ , 

be a sequence of random N x N matrices and 
hj^ a sequence of random N-tuple vectors. Suppose E and E 

are independent of k ; and h^ are independent of H . and h . ior k ^ j ; 

J J 

E = £5 the elements of and h^ have finite variance;E{ H ^ ^ , 

^sS II I— ktf|| = 1 - kc •where c > o . 

Define the fandom sequence q^ by: 


aj+i = 2j 

(A 1) 

(p. = H . q . + h . 
-J J -J -J 

(A 2) 


for j = 0, 1, 2, . . . and q^ is an arbitrary deterministic vector. Then for 
k positive and sufficiently small 


Theorem: Let 




lim II E {q } 1 1 = '0 
. M :• «> 

and 

lim sup I i q J i < V (k) 
■ j — ^ 


with V (k) satisfying 


lim V (k) = o 
k — «► o 

Note that the norm of a random vector u is defined as 




u 



u 


.} 


(A 3) 


(A 4) 


(A 5) 


1 


.(A 6} 
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Pfoof: Combining- equations’ (-A- 1) and (AE) yields 


q ■- - 'i- (I '-t-k.-H .) q . V k-h . 
-J+1 ' J -J -J 


Since q . is independent of , taking the expected value of equation 
(A7) gives' 




Thus 


II E {q.,} II < e ^ II E{q^> 11 ' 


Since | < 1 by hypothesis, equation (A3) follows. 
To prove equation (A4), observe that 


E{qf+iaj+l> = E{q'^(I-kHj) (I-kH.)q. }-E { q (I-kH kh . } - 

- E{kh^ (I-kH.)q. } +k^E{h? h ) 

3 J J J iJ 

But since q .'is independent of H the first term in e'quatipn {A 10} 
J 3 

may be bounded by 

E{q^ (I-kH."^) (I-kH.)q J E{q^ E‘{(I-kH'^) (I-kH ) } q } 

j J J J J J ^ J 

<||E {(I-kH,'^) (I-kH.) 11 llq^ll^ =M.llqj||^ 
where = ,|1 E { (I-kn'^) (I-kH )} H 

*i J 

Ax< 1( A 1-1 il X 11^- 


(A 7) 

(A 8} 
(A 9) 

(AiO) 

(A12) 


Note 
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Combining the second and third terms and using the Schwarz' 
inequality gives 

- 2 k E {q.*^ (I-k H. ) h. } 

-J J “J 

= - 2k[E{q E {h -} -kE{q E {H h.}] = 2k^E{q.'^'} E{h. 

LJ J J Jjj J J 

< 2 k^f i| E { q^ } !| 

where fS [j E{Hjhj}|] 

and f is finite. 

Using (A 9} we get 

- 2kE {q.'^U-kH.) h. } < 2k^f li E{q^} )| 

Applying the, bounds (All) and (A 15) to (A 10) yields 

IlSj+I ||^= E{q'^^jq.^,}< |.||q,j||^+ 2k2f|j||E{q^} il+k^Hh.li^ 

If we now define the bounding sequence of positive numbers Qj^ accord- 
ing to 

and 

«j + l = + 2k^ f II B {q^} II +k^ II h. 11^ 

then it follows from (Al6) that 

li'l Qk 


(A 13) 
(A14) 

(A 15) 

(A16) 

(A 17) 
(A18) 


~j + l 


(A 19.) 
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But the difference equation (A18) has the asymptotic solution, 

lim Q. = ^ — 

j — ^ 1 - P- 


because | < 1. • ' - 


Thus 




1 - n 


where 


llUjil is independent of j by hypothesis. 


Bet us investigate the positive constant jj,: 


(A 20) 


(A 21) 


if c.SH -a (A22) 

J J 

then (I-kH."^) (I-kH.) = (I - k(2^- k G.^) (I - k - k G.) 

J J J J 

... • f. 

= (I-^k fl"^) (I-kd) - (I-k tf"^) k G. -k G."^ (I-k(J) + k^G.^G. 

J i3 J J 

S , J * 

' ' - I ' * ' * 

. ■) ' ^ * 

E{ (I-kH.'^) (I-kH.) } = E { (I-kfZ'^) (I-kfl)|+ k^ E {G.”^ G. } 

J J ^ J J 

»* , '* ‘ *'**-. ^* ’ f 

I 

E{ (I-kH.*^) (I-kH.) } = (I-kO^) (I-ktf) + k^ E {G.'^ G. } (A23) 

i3 J b) 

fx= II (I-k< 3 ‘^) (I-kt 2 ) + k^E{G ^G } II 

J J ^ , 

■ ■ < + k^ y . - (A24) 

where ' j 

Y = ^ ^ I ! ^ finite. 

Futhermore, in all cases | is of the form ^ = 1 - k c where c > o. 




QED 
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IZ1)~122) 

Appendix B Rosen's' ' ' 'Gradient Projection Algorithm 

In this investigation! we indicated that our gradient projection 
algorithm which -adaptively adjusted the tap gains could be thought of 
as a modificaj;ion of Rosen' s algorithm. Therefore, let us now sum- 
marize some well-known linear- ( and nonlinear) programming methods 
of optimizing functions subject to linear (and nonlinear) constraints 
when no noise is present; explain why Rosen' s method is applicable 
to the problem of optimizing functions subject to both constraints and 
noise; and illustrate, for those unfamiliar with Rosen! s algorithm, 
how it would be used to locate the maximum of a concave function 
subject to linear constraints. 

We restrict our discussion to gradient methods of linear and 
nonlinear programming because other methods of optimizing convex 
functions (e. g. Simplex) work essentially by examining the vertices 
of the feasible region, and testing whether or not the conditions for 
optimality are satisfied at the vertex being tested. If the conditions 
are not satisfied we jump to the next vertex. However, since the . 
vertices may be far away from one another, jumping from one vertex 
to another is not what we want in an adaptive algorithm, which must 
have the property that if we are not at the exact optimum we must 
still be "close to" the exact optimum, not at the next vertex which 
may be a considerable distance away. Another point to consider is ' 
that at any single iteration you don't want to move too great a distance 
because we will sometimes be moving in the wrong direction due to 
the presence of noise. This is another reason why we don't want to 
consider just vertices, but rather all points on the boundary of the 
feasible domain. 

All gradient procedures work by moving from an .iteration point 

k. * 

X in the direction of the gradient or, if this is not possible because of 

the constraints, in the direction of a vector s which makes an acute angle 

T k ~ ■ 

with the gradient, i. e. ^ VF (x ) > o. We move in this direction until 
either F reaches its maximum in this direction or until we cannot go further. 

without leaving the feasible domain. The end point give’s the next iteration 

k+ 1 

value X , We never leave the feasible domain thoughout the entire iteration. 
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( 24 ) 

Zoutendijk's' 'method chooses s so that, after a suitable normalization, 
its scalar product with the gradient is maximized under the condition 
that we do riot' immediately leave the feasible domain when moving from 
in the direction s ‘We will not use this algorithm because the max- 
imization step uses the abovementioned linear programming methods which 
are adversely aff'ected by rioise; Another’ procedure is tb restrict the vector 
s to lie in a certain linear manifold of dimension srhaller than n. This ap-- 
proach is used by Rosen.' These two methods are somewhat similar. We'" 
will use Rosen's method because' the iteration -steps appear to be simpler 
and should use less computer- time. 

We will abstract pp 1 63- 170 from Kunzi, Krelle, and Oettli^^^^and 
some numerical examples from Hadleyl^^^ For more details and proofs as 
well as a discussion of how the algorithm may be modified to account for 
nonlinear constraints, see Rosen’s original papers. 

The problem is to maximize the concave function F (x) subject to 
the linear constraints (nonlinear constraints are discussed in Rosen’s 
second paper). 


h. (x) = a . 
"J-- -'-'J- 


b. < 
;J- 


j— 1,2,..., m 


(B' 1) 


where x is an n dimensional vector. 

If a point x° of .the feasible domain (i. e. x° satisfies all the con- 
straints) is not the constra-lned maximum, then we. may look for another 
feasible point-with a higher function value by proceeding from x° in the 
direction of the gradient of the, objective function. This i.s always possible 
if x° is an interior point. However, the method can fail if x° is a boundary 
point, because the gradient vector may point toward the exterior of the feasible 
domain. Rosen's method is to project the gradient onto the boundary of the 

f f * . ' * 

feasible domain and then proceed in the direction of the projection rather 
than in the direction of the gradient itself. More precisely, the gradient is 
projected onto a linear submanifold of the boundary, i, e, on the submani- 
fold of least dimension that contains x°. In three dimensional space, for 
instance, the feasible domain is a polyhedron whose boundary consists of 
manifolds of dimension two (faces), dimension one (edges), and dimension 
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zero (vertices). If x lies on a face but not on an edge, the gradient is pro- 
jected onto this face; if x° lies on an edge, we project on the edge. Rosen’s 
method coincides with the usual gradient method if the point x° lies in the 
interior of the feasible domain. 

We denote the (n-1) dimensional manifold (boundary hyperplane) de- 
fined by h. (x) = o by H. , i. e. 

I J 


H. = { X 1 h. (x ) = o } j = 1, 2, . . . , m 
J 3 


(B 2) 


The boundary of the feasible domain consists of all feasible points 

l^h. (x) < o for all jj with h. (x) - o for atTeast one j. The (non-no rmalized) 

t 

normal vector a . is perpendicular to H. and -points outward from the feasible 
J J 

domain. A number of hyperplanes H. are linearly independent if the corres- 

ponding a . are linearly independent. The intersection of k hyperplanes is 
J < 

the set of points which lie simultaniously on all k hyperplanes. The intersec- 
tion of k linearly'independent hyperplanes forms an (n-k) dimensional linear 
manifold in the n dimensional space of the x vectors. 

Let us now consider the projection of the gradient vector. Say x° lies 
on r hyperplanes. We pick out q linearly independent hyperplanes from among 
these r, which, after a suitable reordering of the indices we may assume to be 
H, , , . . , H . Let D denote the (n-q) dimensional intersection of these hyper- 
planes. The normals a ^ , . . . , are perpendicular to the linear manifold D. 
The q dimensional linear manifold spanned by a , , . . . , ^ will be denoted by 
D. D and D are mutually perpendicular and together span the whole space. 

The projection of a vector y on the linear manifold D is denoted by y^^ and is 
given by ' , , 


where 


T -IT 

P = I - A (A-^ A ) ^ A 

q q q q q 


(B 4) 


and 


A - (a, a_ ...a ) 

q „2 -q’ 


(B 5) 


Note thatP =s I and P = zero matrix, 
o n — 
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Rosen proves that the point x is the unique constrained maximum 

** ' ' * ** * * Ic 

for concave objective functions if and only if x satisfies 


I 

Pq i. (££ ) = 

(B 6) 

/A T . >-l . T , k. . 

(A A ) A g (x ) > o 

' q q' q & ' 

(B 7) 


k . k 

where g (x ) is the gradient vector at point x . 

Condition (B6) states that the gradient vector is orthogonal to the manifold 
D, and thus lies in D. Hence 

q 

g (x^) = y u. a . = A u (B 8) 

^ ^ j-j q - 

Substituting (B8) into (B7) we see that (B.7) may be rewritten as 

u > o 

* i t ' 

Equations (B6) and (B7) together imply that a necessary and sufficient 
condition for the point x to be a constrained maximum is that the grad- 
ient ,bf the objective function be expressible as a non-negative linear com- 
bination of the exterior normals to the hyperplanes on which the point lies. 
This is equivalent to the well-known Kuhn-Tucker' ^conditions. If x is an 
inferior point of the feasible domain, the optimality criterion simplifies to 

Whenever the conditions for optimality are not satisfied Rosen 

k+ 1 

shows there exists a feasible point x which yields a higher objective 
function value. There are two possibilities (we avoid discussing degen- 
eracies) which we consider separately. Denote g^ (x ) by • 

1 

Case I g I j. 

This means that x° is not a vertex of the feasible domain, i. e. 
q < n, and D has at least the dimension of a straight line. We move in 
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the direction given by the vector s = P g, (B9). We will not discuss here 

— q “K 

how far to move in this direction because this part of Rosen's algorithm does 
not apply to our modification of Rosen's algorithm. 


Case II P gi = o ' 
q — k — 


but u. <! o for at least one j. We then choose one of the indices for which 
3 . I 

u. < o , e.g. the one for which | a , | u. is most negative, and then disre- 
J * 1 3 

t 

gard the corresponding hyperplane Ih. Suppose this is the hyperplane 

Then u < o , and we proceed as if x lies only on H. to H , , i, e. we raise 
q 1 q — X 

the dimension of D by one. The associated projection matrix is now P^ ^ 

We have P , a i o because a is independent of a , to a This im- 

q-1 —q ^ ~ — q ^ —1 — q-rl 

plies that 


^q-1 ^ ^q-1 




P 1 3- o 

q ^q-1 -q - 


where z belongs to D. Consequently, in the new D, which has one dimen- 
sion more, we have the same situation as in case I, and we can proceed as 
in that case by setting 


k ^ 
s = P 


q-l^k . 


(BIO) 


These are the main steps involved in Rosen's algorithm. We add 
that nonlinear constraints' can also be- handled, but "we will not discuss that 
algorithm (see Rosen's papers, and chapter six of this investigation) here. 

Finally we present two examples, taken from Hadley, to illustrate 
how the algorithm works. Consider Fig B1 



Fig, B1 Diagram for example one 
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. M ^ 

Assume that the current feasible solution is x . We cannot move in the 

1 ■ ' " ■ t 

direction of the gradient without violating constraint 1, The vector s is 
given by (B9) 



** iC 

This is nothing more than the perpendicular projection of g onto the bound 
ary of the set of feasible solutions, as shown. 

Consider next the situation illustrated in Fig B2 



Both constraints will be violated if we move in the direction of the gradient 
vector. Also P 2 g =0 indicating that it is not possible to move from x 

in any direction such that both constraints hold as strict equalities. Note- 

k ■ ' k 

that when g is expressed as a linear combination of ^.nd a^j g 

“ '^l '^2—2 that is negative. .We can find a feasible direc- 

tion in which to move {case 11} by allowing .constraint 2 to hold as a strict 
inequality, >vhile constraint 1 holds as a- strict equality. If we do this, the 
problem is reduced to the previous 'illustration. 
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chapter 5 

Soft Constraints 


Section 5, 1 Introduction 

In the last chapter'', we devised an algorithm that minimizes an objec- 
tive function subject to constraints which were never to be violated. In this 
chapter, we will devise an algorithm that differs from the gradient' projec- 
tion algorithm of the previous chapter in that this algorithm -minimizes an 
objective function subject to constraints' ^)vhich may'be "slightiy" violated, 
but which cannot be violated "too much. " This type of constraint is known 
in -the literature as a "soft" constraint as opposed to the "hard" constraint 
dealt with in chapter four. 

Again, our final objective is to design an adaptive' algorithrh which 
will maximize the SNR subject to a constraint on the super -gain ratio when 
unknown interfering noise is present. Again because the SNR and super-gain 
ratios are nonlinear quantities, it is difficult to prove convergence of our 
algorithm or to analytically find the algorithm's rate of convergence. Again, 
for the purpose of mathematical tractability aiid because it is useful in its own 
right, we will consider- an adaptive algorithm which minimizes the MSE subject 
to a linear constraint. 

The algorithms of this chapter are simply a gradient minimization of 
a convex modified objective function, the modified objective function consist- 
ing of our original objective function plus a convex penalty function which 
serves to increase the value of our modified objective function whenever the 
constraints are violated, i. e. we will minimize the convex function 


* I = 'Ln + -Wl^^ ) 4, (W-W^^g ) (5. 1. 1) 
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subject to the "soft” linear constraint, shown in Fig 5, 1. 1 below 



Fig. 5. 1. 1 Constraint and Penalty Function Level Curves 
The constraint equation is of the form 

- a = o 

The convex penalty function we will use is given by 

r T 1 2 

f^(w) = K^Lw - 


(5. 1. 2) 


(5, L3) 


The level curves of this penalty function are also shown in Fig 5. 1. 1. 

We should note that if Kj is 'Targe enough" we will always be very "close" 
to the line n ^- 3 . = o which then may be interpreted as a linear approx- 

imation (i. e. the first terms of a Taylor expansion) at point W to any arbitrary 
nonlinear constraint (e.g. the super-gain ratio) provided that as the algorithm 
moves frompointto point in the W space, we keep replacing the nonlinear 
constraint by the best linear approximation to it at each point. 

Assuming we have only one constraint in the problem, as given by 
equation (5. 1. 2) we will present three algorithms, corresponding to the 
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three cases studied in chapter four, i„e. when the gradient is known, when 
we have a noise -free estimate of the gradient, and when we have a noisy 
estimate of the gradient, and for each of these algorithms we will investi- 
gate convergence (convergence of the expected value of the weight vectors 
and bounds on the variance of the weight vectors in cases two and three), 
the rate of convergence, and the bias between what our "soft" constraint 
algorithms converge to and the optimum weight yector when we have a 
"hard" constraint, which was found in section 4. 2 to be 


^opt = ^LMS + 


• lii ) 

■/ T .-1 V 
(£1 Cf > ) 




(5. i.4) 


All three algorithms seek to minimize the modified convex objec- 
tive (j indicates the iteration number) • 


F.(W.) = 6 . 

—3 mm 


+ ‘P (6.1.5) 


In case 1, the gradient of equation (5. 1. 5) is 

j-p 

g (W ) = 2 (^{Wj -Wj^j^g) + 2 [Wj 'Ej^-a] n^ (5.1.6) 

In case 2, we assume (pis not available and must be' estimated by 

s d. and W. which are available 
-J J -J 

g (w^ ) = - ^ Ij ' Wj ) 2 ^1 * E 1 " ^ ^ £ 1 (5. 1. 7) 


In case 3, we assume s . is not availabe, 

J 

is available 


but a noisy estimate ® j 


g ) = 


2 (s^. + n. ) [ d. - (_S-'^+ Ei^^W.] + ^ - a. ] n ^ 

J J . 3 J J J 


(5. 1. 8) 
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Section 5. ,2.1 The Algorithm, Proof of Convergence, and Bounds 

on the Rate of Convergence if the Gradient is Known. 


Using equation (5. 1. 6) the algorithm is 


+ 1 = Wj -k 20(W.,- nj.-a ] n^ 


{5. 2. 1. 1) 


The above equations are a set of first order deterministic difference 
equations. Let us first solve for the asymptotic value of W,_ denoted 
byW . Setting W.,, ‘= W. = gives ' • 

•' — OO — J + i — J —00 


W =W^-_ - K. [W^ . n -a ] (f^n 
—00 — LMS 1 —00 —1 ^ — i 


{5. 2. 1. 2) 


Let 


W = c + dm - 
— 00 — . — 1 


where 


T 

£ • 


n ^ ^ = o 


{•5. 2. 1. 3) 
(5. 2. 1.4) 


Remembering that n = 1- we .have 


c + d £ ^ “ ^LMS - K j [ d - a ], ,0" ^ n ^ 


(5. 2. 1. 6} 


Multiplying by n ^ on the left yields 

^ r T _ + a (n^"^ 


d = 


I + (j) ^n^} 


n 1 ^. W 


LMS 


(5, 2. 1. 6). 


Substituting (5. 2, 1. 6) into (5. 2. 1. 5) yields 


£“ - ■’■Wlms “ ^1 


-1°-LMS'*'^1 ^ ^-l ^ £l^ 

1 + Kj (n ) 


— a 


<^>“^n 1 
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and 


Woo = Wlms 


^1 



-LMS 



(5. 2. 1.7) 


If we let Kj— » 00 , which means that the penalty function is infinite unless 
the weight vector lies exactly on the line W • n ^ — a^ = o, becomes 


W ^ -1- 

—00 — LMS , T ,- 1 




a - n, . W^..„ 
— 1 —LMS 




which is the optimum solution in the "hard" constraint case (see equation 
('5'. lo4)). 

By comparing equation (5. 2. 1. 7) which tells us .the steady state 

value of W that our algorithm converges to, and equation (5. 1.4) which 

tells us what the optimum value that we want to converge to is, we can 

get an idea of how to choose , i. e. in the steady state our penalty 

algorithm converges to = ^LMS ^ ^ where the direction of the vector 

X is the same as the direction of x where W ^ = W_ + x how- 
— —opt —opt —LMS —opt 

ever the magnitude of x is less then the magnitude of . If we want this 

bias to be less than, say 1 % , we must choose to satisfy 


K 


1- 


,99 


1 +K j^) 


/ T ,-l . 

'5l 9 Ej) 


which implies 


^1^ 


99 


99 


(ni cp n^) 


where P is the minimum eigenvalue of 
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We will now investigate how fast our algorithm converges toW^. 


Define 


wj 


(5. 2. 1.8) 


In terms of q , the algorithm is 


ij+i = aj -k j 2?!>(3.j + 2*^1 [ a/ -ii +l^J-ai-“] ai| 


(5. 2. 1.9) 


S.j + 1 = qj - 2 k (pq^ - 2 k Kj (q^ • 


T ; 


- 2k<^.(W^-WLMs) - 2kKj n^{n^ • W^~ a ) (5.2.1. 10) 


K, 


But W -W,>rc 
—00 — DMS 


l+.K^ (n^ 


a - „f . Wj^^g 


(#>"'n 1 




- 2 k K. 


1+Kj 


^ ~ ^l- * ]5Tlms 


^1 


and 


2:l ■ Wo=a7- + 


1 +K^(n^ 0'^£i)- 


-1 ‘ -LMS 3- 

1 + 0"^£l) 


Equation (5, 2. 1, 9) then becomes 


Sj + l = Ij." 2 k <^>qj - 2 k Kj n ^ (n j 


(5. 2, 1, 11) 
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thus 


IIS llajl ( 5 . 

where 

|2 II I - 2 k (^, +Kjnjnj'^)- || (5. 

T 

Note that 0 + Kjn ^ n ^ is positive definite symmetric 

pf: x"^ {0 + Kj n j n X = x'^0x + (x"^ n ^) ( n ^^x } 


From Goldstein 


(28) 


page 24 


I = max 


1 - 2 kp 


n 



{5, 2 


where p ^ and p^ are the min and max eigenvalues of (0 + n ^ n ) • 
respectively. For k small enough 

° < ^ < 1 (5. 2, 

Ecjuation (5. 2. 1, 12) shows that the rate of convergence is given 
by the number ^ , which for k small enough is between zero and one (thus 
guaranteeing convergence), and 1 as k— o (i. e. the rate of conver- 

gence becomes slower as k — »■ o). 

In this section we have proven that our algorithm converges to 
— 00 ^°^ ^ sufficiently small. In the next section we will investigate a more 
useful algorithm, i. e. an algorithm that does not require a priori knowledge 
of 0. 


, 1 , 12 ) 


. 1. 13) 


. 1. 14) 


1 . 15 ) 
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Section 5. 2. 2 The Algorithm, Proof of Conve-rgence, and Bounds 

on the Rate of Convergence of the Gradient is Estimated, 


Using equation (5. 1. 7) the algorithm is 




W.., = W. - k| -2 s . (d.-s.'^ W. ) + ZK [W.T n _ a] n ' 
-J+1 -J I -j j -j -j U-j -1 -I 


(5. 2. 2. 1) 


These equations constitute a set of first-order stochastic difference 
equations. We will first solve for the asymptotic expected value of W, 
denoted by . 'V 

Taking the expected value of equation {5. 2. 2. 1) yields 


E| W.^jl = Ejw.f -k 


-2 E| s . d.l + 2 e| s . s W.l 
*-J j» -Jf 


A 

+ 2 j^E I ~ ^ j — 


Noting that E | ^ | =Ej£j£^'^|E|W^| = 0(s,s)E|'W’^| 

chapter four, we rnay rewrite this equation as 


as in 


e| = e|w.I -k 


— 20 (£, d) + 2 0 (_£, s_) e|W.| 

J 


+ 2 E I W^*^ [ . n J - a j n J 


Using equation (4. 1. 7a) 


0 (£, d) = 0 (£, £) Wj^j^g 

W- . J = E| 3y ^ +‘2k 0 /w T ^ 


j + 1' - I Afj’ * * Hi " ] Hi 
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We now have a set of deterministic first-order difference equations whose 
asymptotic value E { W^}= W^ , can be found by setting E { W } s E{W^} 

“ giving 


W - W^ , 
—00 — LMS 


= - K, 


W ' 
— oo 




6' 




(5. 2. 2. 2} 


This is the same as equation (5. 2. i. 2) 'and the solution is given by 
equation (5.2. lo7)' 


K, 


— 00 —LMS 


l+K^ 


a -n, • W, 

— 1 — LMS 


^ n 


(5. 2. 2, 3} 


Because our difference equations describing the behavior of the weight 
vectors are stochastic, the above result is not sufficient to prove con- 
vergence of the weight vectors to W^, we must also show that the vari- 
ance of the stochastic vectors q.= W is bounded. To do this 

“J 


Define q . = W. - W 
-3 -3 


C5. 2. 2.4) 


the algorithm (5. 2. 2. 1) may be rewritten in terms of q^ as 
aj+i - aj - k[2£j£/ + 2K^n^n^'^ ] q . 

~ ■^'^K^n^n.j j 

[2£. d. + 2Kj a n J 


+ k' 


Define 


H.S *2 s . s -f 2 K n n.^ 
J -r-J -J 1-1-1 


V. S 2 s . d. + 2 K a n , 
-J -J J 1 -1 


(5. 2. 2. 5) 
(5. 2, 2, 6) 


Thus 


ajH-i = ij - ^^3 


(5.- 2. 2. 7) 
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wher e 


. 0. ss H. q . + h . 
“ J -J -J 


(5. 2. 2o 8} 


and 


h : = H. W - V. 
-J J -J 


{5, 2, 2, 9) 


Note that E { H, } and E { h,. } are independent of j . Also H. and h are 
‘ j • * ‘J * . V 

statistically independent of and if j ^ k, because we assumed that 

s . , s are statistically independent for k j . 

— J — k 

Noting that 

E{H^} = 20+2Kjn^n^'^ 

EiVj } = 2 (s_,.d) + 2K^an^ = 2 ^ + 2 k^ a n^ 


we may show that ■ 


E {h . } = o 
-J •’ 


(5. 2. 2. 10) 


Note that E{H. } = 2. ^+2K^nj^iij^ 
definite matrix. 

The algorithm is thus 


= Q is a symmetric positive 


where 


q • n = q • - k (p . 

-j+i -J -3 

J 

(p. = H. q + h . 
^3 3 -j -J 


and H is a sequence of random nxn matrices; h . is a sequence of random 

J • “J , 

n-ttiple vectors; the expected values of and h^ were, shown to be inde- 
pendent of j; and h^ are independent of and h^ for j f 1; Efh^} =£; 
and the elements of H. and h. have finite variance, with E{H.} =(l, where 

• p . “J . . . ^ 

f, is a symmetric positive definite matrix, - ' 

Under these conditions, it is shown in Appendix A of chapter four 
that for k sufficiently small 

lim 'IJ E {q } || = 
j— ►oo J 


o 


( 5 . 2 , 2 . 11 ) 
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and lim sup 1 1 q • 1 1 - V '(k) 

j —CO ^ 

where the norm of a random vector u is defined as 



(5, 2, 2, 12} 


and 


lim V ( k ) = o 
k— »o 


(5. 2. 2. 13) 


Equation (5. 2. 2. 11) shows again that the random weight vectors con- 
verge, in the mean, to W ' . and (5. 2. 2.- 12) shows that the variance 

° ' —optimum 

of the random weight vectors about their expected value is bounded, and 
the bound can be made as small as desired by choosing k siifficiently small 
as shown by (5, 2. 2. 13). 

The rate of convergence of the mean of the random weight vectors 
is shown in the proof of the above theorem to be bounded by where 

e = 11 I - k ( 2 (^ + 2 n^n ^"^ ) II (5. 2. 2. 14) 

T 

Since (2 E { 2 <p + ZK^ )is positive definite symmetric, 

we have 


^ = max ) j 1-kp^l , 1 1-kp^l 


(5. 2. 2. 15) 


where p and p are both positive, and represent the minimum and max- . 

^ ^ (281 
imum eigenvalues of a respectively, as shown by Goldstein' ^page 24, 

Thus o < ^ < 1, and this again proves convergence of the algorithm 
of this section. In the next section we will investigate what happens when 
the estimate of the gradient used in this section, contains additive noise. 
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Sec'tion 5. 2« 3 The Algroithm, Proof of Convergence, and Bounds on 
the Rate of Convergence if the Gradient is Estimated, 
and the Estimate is Noisy. 


Using (5. 1. 8} the algorithm is 


^j+1 - 




r d. - ( s . 
L ^ -J 


= W- — k^— 2(s.+n. ) fd. — (s.*^ + n.^)W. ] + 2K, n , — a n , 

' ’ ’ I , ' . _j ^ j 1 _j _ 1 j _ X 


(5. 2. 3. 1) 

These equations constitute a set of first-order stochastic differ- 
ence equatio.ns. We will first solve for the asymptotic expected value of 

W 1 which we will denote by W 
— - , — 00 • 

Taking the expected value of (5. 2. 3. 1) , under the 'assumption that 

T 

E{n..} s: o , E-{. n.n. }. = (*, and s s; , n ; j n ■ are statistically 

- — J— J -J — k —1. -m ^ 

independent for k ^ j arid h ^ m, we have 


E{Wj+l}. = ~k, -2^ {£, d) + 2 0 (£,£) E{wp + 20^ E(Wj} 


+ 2 Kj^j^E{w^'^} n^ - a j n ^ | 


Using (4. 1. 7a) yields 

^ +■ 2 k [^Wj^j^S - 0 E (Wj} - cj)'^ E }j 


-2kKjE n 

. L • J 



We now have a set of deterministic first-order difference equations 

whose asymptotic value E{w}s W , can be found by setting E{W.}- E{w., , } 
—J — J+1 


= Yloo ’ 


5^lms 


(5. 2. 3. 2) 
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2^00 = = + 

T T 

111 

T 

Remembering that n = Ij (5. 2. 3. 2) becomes 

^LMS - ^ (c + dn ^ [ d - a J .0" ^ 


let 

where 


(5.2,3,3) 

(5.2o3„4) 


(5. 2„3.5) 


Multiplying by 




(I + 




n 


-1 

) on the left, 


and manipulating, gives 


d = 


n^^ (I + (jb'Vn) . Wt + K 


-'LMS 


1 a 0 ■ J 


IfKiiii^ (I + 0-^(5^) 


(5.2.3. 6) 


F rom (5.-2, 3. 5) 


£ + dn j - (I + 0 ) j [ d - a ] 0 n ^ 


-1 


Using (5. 2. 3, 6), after some algebra, we get 


1 -I 

(1+ (f) 

Woo= ri 

l+K n^(I+(^"Vj 




W 


+ Kj a <^~ £ 


LMS 

(5. 2.3.7) 
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If y/e let oo we should get the same solution as equation (4. 4. 3. 9)j be- 

cause the penalty function is infinite unless the weight vector lies exactly 


on 


the line W " ^ t a. - o. Under these conditions, we get' 


W = 
— 00 




-1 




- 1 fp 1 ~ ^ 1 

j^K^n -K^4> J -LMS +K'^ ac^" ^n ^ | 


- 1 1 ^n n (I + (f)~ WlmsI ^ ^ " - 1 


- (1+0 i^LMS"^ 


T r^ , j - 1 , 


-1 


0"^£i . 


W^=(I+0‘^0J {Wlms + 


[ ^ Wx.MS_ 


n (1+0 ^0jj) 0 ^!ll 


(5.E.3.8) 


This is exactly the same as equation (4,3 . 3, 9). 
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Agairij because our difference equations describing the behavior of 
the weight vectors are stochastic, the above result is not sufficient to prove 
convergence of the weight vectors to , -we must also show that the vari- 
ance of the stochastiC;Vectors q . = w. W is bounded. To do this 

“J -J 


define q. = W. — W 
-J - -J ~oo 


(5. 2.3.9) 


the algorithm, (5, 2. 3. 1) may.be rewritten in the form 


-kj 2(s. + n.) (s^^ + n?*) + 2K, n n^l 
L -J -J -J -3 1 -1 -1 J 


+ k 2 (s . +n . ) d. + 2 K, a n s, 
L -J -J ' J 1-1 


W, 


00 


Define 


H. 

•J 


T . T, 


S 2(s.-i-n.)(s- +n.)4-2 K, n , n 

-J . -J -J. . ~J 


(5,2.3.10) 


V. s 2(s '. + ri. ) d. +'2 K, a n , 
-J - “J -J ' J 1-1 


(5.2.3.11) 


aj+i = aj - 


(5.Z.3.12) 


where 


and 


(P . = H. q . + h . 
3 -3 ~3 


h. S H. W - V. 
-3 3 -« ~3 


(5.2.3.13) 


(5.2.3.14) 
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Note that E { H. } and E {h..} are independent of j. Also H. and h. are 
^ • -'J J 

statistically independent of and if. j ^ k because we assurned 

S', i s. , n . , n^ are statistically independent for kj/ j and n ^ m. 

* 

Again, as in the last section, it can be shown that 


E { h. •} = o 
-J - 


(5.2,3.15) 


1 J' ' T 

Note that E { }; = 2 (0f 0^} + 2K^n^nj^ S<2 is -a symmetric posi- 

tive definite matrix. 


The algorithm is thus 

‘if 


where 


ij+i = aj - , 


0. = H.' q . +, h . 
3 “J -J 


and H. is a sequence of random n x n matrices; h is a sequence of random 

1 3 r 

n-tuple vectors; the expected values of H. and h. were shown to be independent 

1 1 I 

of j ; H. and h . are independent of and h^.. for 2 J = _£> and the 

elements of H. and h. have finite variance, with E { H. } =f2, where « is a" 
11 .1 
symmetric positive definite matrix. 

Under these conditions, it is shown in Appendix A of chapter four, that 

I ' 

for k sxifficiently small 


lim II E { q^ } |j = o 


• oo 


and 


lim sup [ j q . [ I < V (k ) 
1 

j — 00 


(5.2.3. 16} 


(5., 2.3, 17) 


where the- norm of a random vector u is defined as 


11“ 11^ J 


E { u ^ u} 
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and lim V (k) = o 

k--^o {5.2.3.18] 

Equation (5. 2. 3. 16} shows again that the random weight vectors converge, in 

the mean, to W and (5. 2. 3. 17) shows that the variance of the random weight 
— '00 

vectors about their expected value is bounded, and the bound can be made as 
small as desired by choosing k sufficiently small as shown by (5, 2. 3. 18). 

The rate of convergence of the mean of the random weight vectors is 
shown in the proof of the above theorem to be bounded by % , where 

e = ii I - k (20+ 20^+ 2K^ n^ n^'^) | [ (5.2.3.19 

Since ( 2(^ + 2 0^ + 2 n definite .symmetric, 

we have 

^ = max I I 1-kpj^l , I l-kp^]| (5,2,3,20 

where p and p are both positive, and represent the minimum and maximum 

■ . (28) , 

eigenvalues of ^2 respectively, as shown by Goldstein page 24., 

Thus o < 4 < 1'. 

In looking at the two approaches we have developed for adaptively 
optimizing the MSE subject to a constraint, the approach in .chapter four, 
represents an entirely new approach to the problem, whereas the approach 
in this chapter is essentially one of replacing the constrained problem by 
an unconstrained problem. Since stochastic unconstrained problems have 
already been well researched, we will not run computer simulations of the 
algorithm of this chapter, but will rather concentrate our efforts on the 
new algorithm developed in chapter four. 
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CHAPTER 6 
Computer Simulations 

In chapter three, we found the optimum SNR that we could achieve 
subject to a constraint on the super -gain ratio. Specifically, we showed 
that for a linear array of four isotropic detectors spaced d = 0. 8 X (0.4X) 
apart, subject to the super-gain constraint Q = 0 . 08 ( 0. 11 ), embedded in 
a uniform noise field, with a normalized signal impinging from broadside 
(endfire), the best SNR we could get at the array output was 0. 187 (0.438). 

In this chapter we will simulate a projected gradient algorithm which 
automatically makes an array of four isotropic detectors spaced d= 0.8 \ 

(0,4 X) apart maximize the average output SNR, subject to the constraint- 
that the super-gain ratio Q is < 0. 08 (0, 11) when the signal impinges from 
broadside (endfire) and the noise is isotropic. 

We will again (as in chapter three) assume that the signal and noise 
are sufficiently temporally narrowband so that the filter following each de- 
tector can be implemented by only two taps (or attenuators) separated by a 
quarter period delay as shown in Fig. 6 , 2. 1 when using the multichannel 
filter point of view. This corresponds to Fig, 3. 1. 2 when using the antenna 
point of view. 

We will formulate the problem first from the antenna point of view, 

i, e, we will write the SNR and super -gain ratio (Q-f actor) in terms of the 

real and imaginary parts of the detector currents ^ 2 r* ^ 2 i’ ° '* 

I,. , I^* , and second from the multichannel filter point of view, i. e. we 
4r 4i 

will write the SNR in terms of w w^. . » . w.^, Wg. In agreement with the 
results of chapter two, we will observe that equivalent to Wj,, Ij^^ is 

equivalent to W 2 , I 23 . i® equivalent to Wg , etc. We will then use this equiva- 
lence to write the expression for the super-gain ratio in terms of w^ , , , . , Wg. 
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Section 6. 1 Antenna Theory Approach 

' When the ■signai: is impinging from br.oad^ide, the time average signal 

power coming out of the array is given by equation (2. 1. 10} 


S = I I V 


1^1 1 


where is giyen by equation (3. 1. 14) 

__ V ' i . , , , 

V , = col [ 1 11 1 ] 

• . . -t< 

Writing- 1 -as coi [ -f-j ^2r"^ ^ ‘^2i’ Ht ^3i* ^4r"*"^ ^4i^ ’ 

expanding and then rearranging gives 


S = 



10 10 10 10 

0 10 10 10 1 

1 0 1 0 1 .0 1 0 

0 10 10 10 1 

10 10 10 10 

0 10 10 10 1 

■ 1 0 r ,'o 1 0 1 0 

0 10 10 10 1 


where now I = col [ l2r ^2i Sr Si Sr Si^ 


( 6 . 1 } 


{ 6 . 1 . 2 ) 


Assuming the nois.e field is unifo.rm , as in chapter three, the time 
average noise power coming out of the array is given by equation {Z. 1, 12) 


N = |J[ A1 

where A is given by {3, 1. 11). 

This expression can be manipulated into 
N = 

where I is given by (6, 1, 2) and 


(6, 1,3) 
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In terms of this eight dimensional I vector, the Q factor is given 
by (see equation (3. 1. 13) ) 



(6. h 5) 


If the signal impinges from endfire the only quantity that changes 
in the above formulation is the time average signal'fpower S (the noise power 
and the Q factor are the same as for the broadside signal case). Now 


S 




V 


1 


I 


where from (3. 1. 15). 


^1 


= col 


. , 371 d , . , d , 

J (- — ) J(-K— ) 


d 


■) 


This expression may be manipulated into 


S 



(6. L 6) 


where I is given by (6. 1. 2) and 





( 6 . 1 . 7 ). 


-Lfl- 
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Section 6. 2 Multichannel Filter Approach 

L»et us now find the time average output signal power due to a deter- 
ministic signal generated by a far field point source (see Fig. 6. 2 , 1 ). 


At each detector, the signal is given by 


Re 



j w t 
e 


= cos ( w t - — ^ u • r . } -(6, 2, 1) 

' X. — o —1 

The output y (t) due to the signal is 


y (t) = I w^ w^ W3 w^ Wg w^ w^ Wg ] 


cos 

(cot - 

2 v 

X 

u 

— 0 

” 1 . 1 ^ 


T 

=W a ^ 

cos 

{tot - 

2 tt 

\ 

u 

— 0 

• ^ll - 

CO 


cos 

(cot - 

2 ir 

u 

— 0 

• I2) 



cos 

(cot - 

Zv 

X 

u 

— 0 

“ 12 ■ 

■ CO A) 


cos 

(cot - 

2 tr 

X 

u 

~-o 

• £3) 


( 6 . 2 , 2 ) 

cos 

(co t - 

2 tt 

X 

u 

~o 

’ £3 - 

CO A) 


cos 

(cot - 

2 'ir 

X 

u 

— 0 

' £4) 



-cos 

(co t - 

2 ir 

X 

u • 

—0 

' -4 ■ 

CO A) 
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S| (t)-f-n|(t) 


TIME DELAY S2 (t)+h2(t) 

j 


S3(t)+n3(t) priME DELAY I %(t)+n4(t) 


Sg(t ) + n 3 (t) 


TIME DELAY 

A ■_ I 


S6(t)+n6(t) 


S7(t) + n7(t) TIME DELAY S3(t)+n8(t) 


Fig. 6. 2. 1 Processor Structure 
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* *1* T / ^ 

. The signal output power is S (t) = W ^ j ^ j ^ 

For the case of a boradside signalj u^* £ £.2 “ —o' —3 ” — o* —4 " 

T . • . 

Letting d = cos ut, e = cos (wt-juA), the matrix,^ a is given by 




tO|H» N[h» JNIm tvj|H to|M 
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2 tt/o) 

.. -i- C ^--2 

■ 2ir 


2tt/.co 


Sinc,e J co^ cotdt = f and f cos wt cos (ojt-wA) dt cos w-A 


o o 

the time average signal power output is given by R W where the matrix 
R is 


fcQsco/^ 2 ^-coscoZli i icostoA 2 fcoswA 


cos CO A i cos w A 2 cos w A 


^ cos 00 A 


|-coso) A Y -l-cosoj A 


2 COS CO 2 Z COS CO 


COS CO ^ 2 COS CO ^ 2 2 CO ^ 2 2 COS CO ^ 


■| cos CO A i -I cos (o A i .-I cos (0 A f -I cos to A 


cos CO A 


J J cos CO A 2 i’ ^ ^ 


2 2 cos CO A 2 


2 cos CO A 


cos CO A i cos CO A 2 -IcoscoA 


cos CO A 


j cos CO A i cos CO A I cos CO A 


( 6 . 2 . 4 ) 
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Since cos co A = cos — ^ = o, this expression for the average signal 
power becomes identical to equation (6. 1. 1) with the vector W replacing 
the- vector ^ of equation (6? 1. 2). Similarly we can show that the expressions 
representing the time average noise power in teiirirfs of J[ and W are identical 
if we replace I by W> i. e. 


N 



(6. 2. 5) 


where E is given by (6. 1.4) 


T 

W W 

Q = (6.2,6) 

T 

W E w 


and, if the signal impinges from endfire 

S = -I F W 


(6. 2. 7) 


where F is given by (6. 1. 7) 
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Section 6.3 Majdmigation of SNR Subject to Q< q 


The reason we went through two separate formulations of the sa'me 
physical problem in sections 6. 1 and 6. 2 is as follows: In the W formulation 
the numerator matrix in the expression for the SNR is of rank two, and this 
makes it impossible for us to conclude from this formulation that the SNR 
is a concave function of the W’ s, and hence possesses a unique maximum. 
However by using the complex I formulation, we will be able to show that 
there exists one unique value of I (and hence by our analogy, one unique 
value of W) which maximizes the SNR, The proof is as follows: 


f 

By equation (2,-1. 13) 


SNR 


i XiYi i 


A I 


(6.3. 1) 


Let us take the fir-st variation of the SNR with respect to the complex 
vector I and set it equal to zero to find the possible extreme points. 


6 (SNR ) = 


) [i'"Y 1 Y r ) 1 1 X lij 


irAiY 


+ (5 if) Aj] 


(■rA lY 


«z« 


Letting y S .(_I>. A _I )^ V (^ V^V^'"_I)2" A 
equation (6. 3. 2) becomes after rearranging 
y^6l+6l^y = o 


= o 


(6.3. 2) 


(6.3.3) 


(6,3.4) 
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Since this equation must hold for arbitrary 6j[ where _! is complej^, 
(6,3,4) implies that y = £> which implies 


1 ) 


-j. 

YiYi i) 


Yl = (6.3.5) 

This equation is satisfied if (I^V, ) = o, which would mean that equation, 

(6.3. 1) was zero, obviously a minimum value, or if 


(I* A I ) 

I = -Z Z A V 


'jjSf 


(6, 3, 6) 


This value of I gives the unique maximum of the SNJR. 

There is also only one unique minimum. Corresponding to these 
two values of I, there is a unique value of W which maximizes th§ SNR, 
and one unique value which minimizes the SNR.. 

It is easy to prove that the set of points ■W"vyhiqh satisfy Q (W)< q 
is star connected about = o _ , by observing that if Q (W) < q, then 
Q (x ) where x = X W+(l-X) ° ^ ^ 1=^ x = X W also satisfies 

Q (x) < q. This star connectedness is a consequence of the fact that the 
Q factor is independent of the magnitude of W. 

Because the region Q (W) < q is connected and tjie objective 
function SNR (W ) is concave, our projection algoj^thm will converge 
to the constrained maximum, which occurs at the unconstrained ma-ximurn 
of the SNR, or on the boundary of the feasible region ( in the broadside and 
endfire cases under study, we know that the unconstrained rnaximum of the 
SNR lies outside the feasible region by the graphs in chapter three ). 

Since the solution to the problem of maximizing the SNR subject to the con- 
straint Q < q, lies on the boundry (i. e. Q = q ), the Lagrang.e solution we 
found in chapter three is also the solution we should wind up with in this 
chapter. 
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S'ection 6', 4 ' 'The Gradient Projection Algorithm 

W*^FW . . , 

• The function to be maximized is SNR =— 3=^= ==- subject to the con-:. 

... - • ew 

W^W ■ . . 

constraint Q = — . — ^ < q; . Note that since the signal direction is as- 

W^AW " ° 

sumed known to us (i. e. F is known), we never need to know the signal it- 
self (as opposed to needing d. when we used a MSE criterion in chapter four). 
We- will investigate three cases: 

1. The spatial distribution of the noise is known a priori (i. e. the elements 
of the matrix -E are known) and there is no additive self-noise associated with 
each detector. 

2. The spatial distribution of the noise is unknown (i. e, E must be esti- 
mated from observations of the detector outputs when there is no signal pre- 
sent) and there is no additive self-noise associated with each detector. 

.3. The spatial distribution of the noise is unknown and there is additive 
self-noise associated' with each detector. 

-‘Before we. describe the algorithm, note that the gradient of the SNR 
is given' by 


TW).2EW + (w'^Ew)2F.-W 

V^'{SMl) = — ^ ^ ^ ^ ^ 

- ‘ EW)2 

Also note that the normal to the hyperplane tangent to the surface 

T ' 

W W 

Q = — =, =.'q --.is given by 

W AW' . . ■ . 


-(w'^ ■W)' 2 A' W -f (w'^AW) 2W 

: - " - ^ - - ( 6 . 4 , 2 : 

(W^ A W).^ 
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Our algorithm works as follows: We start at any arbitrary value 

w (w = col [ llllllll]. We check to see if W^ satisfies the constraint , 
vvq o 

(if it does not, we keep moving in the direction -n, i. e. W^^ j “ Wi “ 
until we arrive at a value of ‘vywhich does satisfy the constraint). In case 
1, we try to move in the direction given by the -gradient, i. e, 

W - W. + kV^(SNR) ' (6.4.3) 

where V-^ (SNR) is given by (6. 4. 1). We next check to ma^e sure 

it satisfies the constraint. If it does, we continue our iterations as given 
by equation (6.4, 3) indefinitely. If,, on the oth^^ hand does not satisfy _ 
the constraint, we form a different given by 

W. , = W. + k PVw(SNR) (6.4.4) 

—1+1 — J s! 

where P, the projection matrix, is given by I - nn and n is given by 
(6.4. 2). Provided k is "small enough, " this :j^lue of Wj.|.]^ 'will always 
satisfy the constraint and give a higher value of SNR than W. » because we 
are projecting the gradient into the hyper plane tangent to the constraint as 
shown in Fig. 6.4.1. 



Fig. 6. 4, 1 Gradient Projection Operation 
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The reason k must be ''small enough" is intuitively clear from the 
same figure. If we move too far along the hyperplane tangent to the con- 
straint at , we may not satisfy the constraint at order to re- 

solve this problem in our simulations, we chose k so as to make the square 
of the norm of k (SNR) equal to 0. 001 times the square of the norm of 
Wj j i. e. 


? I ! w. I i 

k = J 0.001 ^ 

iiV^(SNR)|| 


(6.4. 5) 


In case 2 where the noise correlation matrix E is unknown, for 
each element E.. = E { n . (t) n.(t) } of the matrix E we substituted the 
.instantaneous value of the correlation, i. e. E— *^E where E^. at iteration 

k is given by n. (t) n. (t, ) (see Fig. 6. 2. 1). In chapter four we proved that 

* ^ J .... 

we would get convergence by using this substitutuon if our criterion was 

to minimize the MSE subject to a linear constraint. 


In case 3 we substituted the matrix E for the matrix E in (6. 4. 1) 

where E^. at iteration k is given by E^^ = [n^ (tj^) + ] 

[ n. (t ) + (t. } ] where | (t . ) is white gaussian noise of variance 0. 1. 


To generate the vector random variables n^^ such that 
= E, we did the following: E is a positive definite matrix^so^that it possesses 
a square root, call the square root matrix E where E^E^ = E. We 
generated a vector random variable V, all of whose components were zero 
mean independent gaussian random variables with variance one. Then 


N, = E^ V 
— k — 


1_ rp ^ 


and n^ satisfies E{rij^nj^'^} s:E{E^VV E^} 


1 jyt ^ — 

- e 2 e{vV^} E^ s= E^IE^ = Eas required. 


(6.4. 6) 
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We simulated the aforementioned three cases for a signal imping-, 
ing from both broadside and endfire and obtained the. results shown in 
Figures 6.4.2 - 6.4.7. Note that in case 1 where the E matrix, -and 
hence the gradient, was known we used k = 0..5,. and we did not normalize 
k by equation (6, 4. 5). 

By comparing Figs. 6. 4. 3 to 6. 4. 4 and 6. 4. 6 tp 6, 4. 7, it can be 
seen that, as expected, the algorithm converges to the constrained optimal 
value faster, and there is less variance about the optiinal value, when 
there is no additive detector noise present. 
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Section 6. 5 Conclusions 

We have presented and analyzed two stochastic gradient algorithms, 
which can be used to find a constrained optimum point for a concave or 
convex objective function subject to constraints which form a connected 
region, even when we do not have the objective function available, but 
only have a noisy estimate of the objective function. When the constraints 
consisted of only one linear constraint, we proved convergence to the 
constrained optimum value and bounded the rate of convergence of the 
algorithms to the constrained optimum value. 
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